SQL Server 2000 Better Practices

 

More often than not people search for coding practices. Coding practices are widely available at tons of sites. But here are a set of practices that are better practices of coding while using SQL Server 2000. The views listed below are from my experience and I am sure any SQL Server Guru might not argue it otherwise. You can also consider these as guidelines for development in SQL Server 2000. Hope you get good mileage out of this article.

 
Note: In this article I’ll assume that you already know the T-SQL syntax and we are working with SQL Server 2000.
 

1. Normalize your tables

 

There are two common excuses for not normalizing databases: performance and pure laziness. You’ll pay for the second one sooner or later; and, about performance, don’t optimize what’s not slow. And, more frequent than the inverse, the resulting design is slower. DBMS’s were designed to be used with normalized databases and SQL Server is no exception, so design with normalization in mind.

 
2. Avoid using cursors
 

Use cursors wisely. Cursors are fundamentally evil. They force the database engine to repeatedly fetch rows, negotiate blocking, manage locks, and transmit results. They consume network bandwidth as the results are transmitted back to the client, where they consume RAM, disk space, and screen real estate. Consider the resources consumed by each cursor you build and multiply this demand by the number of simultaneous users. Smaller is better. And good DBAs, most of the time, know what they are doing. But, if you are reading this, you are not a DBA, right?

 
Having said this the other question that comes is, If I were to use cursors then …. ? Well here are my 20Cents on cursor usage. Use the appropriate cursors for the job in hand.
 
  •  
  • Don’t use scrollable cursors unless required
  •  
  • Use readonly cursors if you do not intend to update. This would be 90% of the situations.
  •  
  • Try to use Forward Only cursor when using cursors
  •  
  • Don’t forget to close and deallocate the cursors used.
  •  
  • Try to reduce the number of columns and records fetched in a cursor
     
    3. Index Columns
     
    Create Index on columns that are going to be highly selective. Indexes are vital to efficient data access; however, there is a cost associated with creating and maintaining an index structure. For every insert, update and delete, each index must be updated. In a data warehouse, this is acceptable, but in a transactional database, you should weigh the cost of maintaining an index on tables that incur heavy changes. The bottom line is to use effective indexes judiciously. On analytical databases, use as many indexes as necessary to read the data quickly and efficiently.
     
    Now a classic example is DONOT index an column like "Gender". This would have a selectivity of 50% and if your table is having 10 Million records, you can be least assured that using this index you may have to travel half the number of rows … Hence maintaining such indexes can slow your performance.
     
    4. Use transactions
     
    Use transaction judiciously. This will save you when things get wrong. Working with data for some time you’ll soon discover some unexpected situation which will make your stored procured crash. See that the transaction starts as late as possible and ends as early as possible. This would reduce the requirement to lock down the resources while accessing.
     
    5. Analyze deadlocks
     
    Access your tables on the same order always. When working with stored procedures and transactions, you may find this soon. Any SQL programmer / database analyst would have come across this problem. If the order changes then there would be a cyclic wait for resources to be released and the users would experience a permanent hang in the application. Deadlocks can be tricky to find if the lock sequence is not carefully designed. To summarize, Deadlock occurs when two users have locks on separate objects and each user is trying to lock the other user’s objects. SQL Server automatically detects and breaks the deadlock. The terminated transaction will be automatically rolled back and an error code 1205 will be issued.
     
    6. GOTO Usage
     
    Avoid using the infamous GOTO. This is a time-proven means of adding disorder to program flow. There are some cases where intelligent use of GOTO is preferable to dogmatically refusing to use it. On the other hand, unintelligent use of GOTO is a quick ticket to unreadable code.
     
    7. Increase timeouts
     
    When querying a database, the default timeout is often low, like 30 seconds. Remember that report queries may run longer than this, specially when your database grows. Hence increase this value to an acceptable value.
     
    8. Avoid NULLable columns
     
    When possible, normalize your table and separate your nullable columns. They consume an extra byte on each NULLable column in each row and have more overhead associated when querying data. It will be more flexible and faster, and will reduce the NULLable columns. I’m not saying that NULLs are the evil incarnation. I believe they can simplify coding when "missing data" is part of your business rules.
     
    9. TEXT datatype
     
    Unless you are using it for really large data. The TEXT datatype is not flexible to query, is slow and wastes a lot of space if used incorrectly. Sometimes a VARCHAR will handle your data better. You can also look at the "text in row" feature with the table options for SQL Server 2000. But still I would stick to the first statement, Avoid using them on first place.
     
    10. SELECT * Usage
     
    Its very difficult to get out of this habit, but believe me this is very essential. Please DONOT use this syntax. Always qualify the full list of columns. Using all columns increases network traffic, requires more buffers and processing, and could prove error prone if the table or view definition changes.
     
    11. Temporary tables usage
     
    Unless strictly necessary. More often than not a subquery can substitute a temporary table. In SQL Server 2000, there are alternatives like the TABLE variable datatype which can provide in-memory solutions for small tables inside stored procedures too. If I were to recollect some of the advantages of using the same:
     
  •  
  • A table variable behaves like a local variable. It has a well-defined scope, which is the function, stored procedure, or batch in which it is declared. Within its scope, a table variable may be used like a regular table.
  •  
  • However, table may not be used in the following statements: INSERT INTO table_variable EXEC stored_procedure SELECT select_list INTO table_variable statements.
  •  
  • Table variables are cleaned up automatically at the end of the function, stored procedure, or batch in which they are defined.
  •  
  • Table variables used in stored procedures result in fewer recompilations of the stored procedures than their counterparts temporary tables.
  •  
  • Transactions involving table variables last only for the duration of an update on the table variable. Thus, table variables require less locking and logging resources
     
    12. Using UDF
     
    UDF can replace stored procedures. But be careful in their usage. Sometimes UDFs can take a toll on your applications performance. And UDFs have to prefixed with the owners name. This is not a drawback but a requirement. I support usage of SPs more than UDFs.
     
    13. Multiple User Scenario
     
    Sometimes two users will edit the same record at the same time. While writing back, the last writer wins and some of the updates will be lost. It’s easy to detect this situation: create a timestamp column and check it before you write. Code for these practical situations and test your application for these scenarios.
     
    14. Use SCOPE_IDENTITY
     
    Dont do SELECT max(ID) from MasterTable when inserting in a Details table. This is a common mistake, and will fail when concurrent users are inserting data at the same instance. Use one of SCOPE_IDENTITY or IDENT_CURRENT. My choice would be SCOPE_IDENTITY as this would give you the identity value from the current context in perspective.
     
    15. Analyze Query Plans
     
    The SQL Server query analyzer is a powerful tool. And surely is your friend, and you’ll learn a lot of how it works and how the query and index design can affect performance through it. Understand the execution plan that the execution plan window shows for potential bottlenecks.
     
    16. Parameterized queries
     
    Parameterize all your queries using the sp_executesql. This would help the optimzer to cache the execution plans and use the same when requested the second time. You can cache-in the time required to parse, compile and place the execution plan. Avoid using of D-SQL as much as possible.
     
    17. Keep Procedures Small
     
    Keep SPs small in size and scope. Two users invoking the same stored procedure simultaneously will cause the procedure to create two query plans in cache. It is much more efficient to have a stored procedure call other ones then to have one large procedure.
     
    18. Bulk INSERT
     
    Use DTS or the BCP utility and you’ll have both a flexible and fast solution. Try avoiding use of Insert statement for the Bulk loading feature, they are not efficient and are not designed for the same.
     
    19. Using JOINS
     
    Make sure that there are n-1 join criteria if there are n tables.
    Make sure that ALL tables included in the statement are joined. Make sure that only tables that
     
  •  
  • Have columns in the select clause
  •  
  • Have columns referenced in the where clause
  •  
  • Allow two unrelated tables to be joined together are included.
     
    20. Trap Errors
     
    Make sure that the @@ERROR global variable is checked after every statement which causes an update to the database (INSERT, UPDATE, DELETE). Make sure that rollbacks (if appropriate) are performed prior to inserting rows into an exception table
     
    21. Small Result Set
     
    Retrieving needlessly large result sets (for example, thousands of rows) for browsing on the client adds CPU and network I/O load, makes the application less capable of remote use, and limits multi-user scalability. It is better to design the application to prompt the user for sufficient input so queries are submitted that generates modest result sets.
     
    22. Negative Arguments
     
    Minimize the use of not equal operations, <> or !=. SQL Server has to scan a table or index to find all values to see if they are not equal to the value given in the expression. Try rephrasing the expression using ranges:
     
    WHERE KeyColumn < ‘TestValue’ AND KeyColumn > ‘TestValue’
     
    23. Date Assumption
     
    Prevent issues with the interpretation of centuries in dates, do not specify years using two digits. Assuming dates formats is the first place to break an application. Hence avoid making this assumption.
     
    24. SP_ Name
     
    DONOT start the name of a stored procedure with SP_. This is because all the system related stored procedures follow this convention. Hence a valid procedure today may clash with the naming convention of a system procedure that gets bundled with a Service pack / Security patch tomorrow. Hence do not follow this convention.
     
    25. Apply the latest Security Packs / Service Packs
     
    Even though this point applies to the network and the database administrators, it is always better to keep up-to date on the software’s. With the "slammer" virus and many more still outside, it is one of the best practices to be up-to date on the same. Consider this strongly.
     
    26. Using Count(*)
     
    The only 100 percent accurate way to check the number of rows in a table is to use a COUNT(*) operation. The statement might consume significant resources if your tables are very big because scanning a large table or index can consume a lot of I/O. Avoid these type of queries to the maximum. Use short circuiting methods as EXISTS etc. Here is one other way you can find the total number of rows in a table. SQL Server Books Online (BOL) documents the structure of sysindexes; the value of sysindexes.indid will always be 0 for a table and 1 for a clustered index. If a table doesn’t have a clustered index, its entry in sysindexes will always have an indid value of 0. If a table does have a clustered index, its entry in sysindexes will always have an indid value of 1.
     

    SELECT object_name(id) ,rowcnt
    FROM sysindexes
    WHERE indid IN (1,0) AND OBJECTPROPERTY(id, ‘IsUserTable’) = 1

     
    27. Ownership Chaining
     
    Try using this feature (available from SQL Server 2000 SP3), for permission management within a single database. Avoid using this feature to manage permissions across database.
     
    28. SQL Injection
     
    Security has been a prime concern for everyone. Hence validate all the incoming parameters at all levels of the application. Limit the scope of possible damage by permitting only minimally privileged accounts to send user input to the server. Adding to it, run SQL Server itself with the least necessary privileges.
     
    29. Fill-factor
     
    The ‘fill factor’ option specifies how full SQL Server will make each index page. When there is no free space to insert new row on the index page, SQL Server will create new index page and transfer some rows from the previous page to the new one. This operation is called page splits. You can reduce the number of page splits by setting the appropriate fill factor option to reserve free space on each index page. The fill factor is a value from 1 through 100 that specifies the percentage of the index page to be left empty. The default value for fill factor is 0. It is treated similarly to a fill factor value of 100, the difference in that SQL Server leaves some space within the upper level of the index tree for FILLFACTOR = 0. The fill factor percentage is used only at the time the index is created. If the table contains read-only data (or data that very rarely changed), you can set the ‘fill factor’ option to 100. When the table’s data modified very often, you can decrease the ‘fill factor’ option to 70 percent, for example. Having explained page splits in detail I would warn you in over looking at this point because more free space means that SQL Server has to traverse through more pages to get the same amount of data. Hence try to strike a balance and arrive at an appropriate value.
     
    30. Start-up Procedures
     
    Verify all the stored procedures for safety reasons.
     
    31. Analyze Blocking
     
    More often than not any implementers nightmare would be to see a blocking process. Blocking occurs when a process must wait for another process to complete. The process must wait because the resources it needs are exclusively used by another process. A blocked process will resume operation after the resources are released by the other process. Sometimes this can become cyclic and the system comes to a stand still. The only solution is to analyze your indexing strategy and table design. Consider these points strongly.
     
    32. Avoid Un-necessary Indexes
     
    Avoid creating un-necessary indexes on table thinking they would improve your performance. Understand that creating Indexes and maintaining them are overheads that you incur. And these surely do reduce the throughput for the whole application. You can create a simple test on a large table and find it for yourself how multiple indexes on the same column decrease performance.
     
    33. Consider Indexed Views
     
    Sometimes we would require an view to be indexed. This feature is bundled with SQL Server 2000. The result set of the indexed view is persist in the database and indexed for fast access. Because indexed views depend on base tables, you should create indexed views with SCHEMABINDING option to prevent the table or column modification that would invalidate the view. Hence using them can reduce a lot of load on the base tables but increases the maintainability.
     
    34. WITH SORT_IN_TEMPDB Option
     
    Consider using this option when you create an index and when tempdb is on a different set of disks than the user database. This is more of a tuning recommendation. Using this option can reduce the time it takes to create an index, but increases the amount of disk space used to create an index. Time is precious, disk is cheaper.
     
    35. Reduce Number of Columns
     

    Try to reduce the number of columns in a table. The fewer the number of columns in a table, the less space the table will use, since more rows will fit on a single data page, and less I/O overhead will be required to access the table’s data. This should be considered strongly by applications that talk across different machines. More the unwanted data passed more is the network latency observed.

    Source : http://www.extremeexperts.com/SQL/Articles/BestPractices.aspx

    Vinod Kumar (vinodk_sct@hotmail.com)
    Microsoft India Community Star
    http://www32.brinkster.com/sqlvinod/

    Advertisements

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out /  Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out /  Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out /  Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out /  Change )

    w

    Connecting to %s