Wednesday, September 30, 2009

SQL Server Collation

Server and Database Collation on SQL Server is something that you never really give much thought until you run into a problem with it. I just happened to run into a problem with it today that caused me some pain and pointed out some minor flaws in my t-sql code. The problem turned out to be a case-sensitivity problem because of the server collation. Someone had installed SQL Server 2000 and set the collation to be case-sensitive instead of the default of case-insensitive. On SQL Server 2000, the collation can only be set for the server during installation. So unless you want to re-install SQL you're stuck with it. On SQL 2005, you can change it without re-installing the server but it's not much easier. There are plenty of blogs out there explaining how to do it. The real problem seems to be if you've designed a database with the wrong collation and want to go back and change it. I haven't tried it but the basic idea I got was that you script out the database objects and and re-create them and then import all your data into the new structure...but I haven't tried this nor do I want to. An easier solution is to take a look at the collation before you build the server and all the objects in the database(s).

Here is what I'm rambling about, lets look at the collations and explain what they mean. There are a ton of them, but I'm going to cover the default and its cousin that caused me trouble today. By default SQL Server, at least the English version, uses "SQL_Latin1_General_CP1_CI_AS" as the collation for the server and that is inherited by all the database which are then created on that server unless you specify otherwise. The "CI" in there means "case-insensitive", the other version has a "CS" for "case-sensitive" or "SQL_Latin1_General_CP1_CS_AS".

So what is the big deal over case-insensitive and case-sensitive you ask? I'm not 100% sure why you would want case-sensitive on your database server. On the upside, all of your variables, columns names, text in your where clauses have to be the exact case of whatever you're querying. On the downside, it can make for some tricky debugging later on and drive you crazy trying to figure it out. Here is why, you can could have multiple variables...yes multiple variables with the same name but variations of case which are unique variables. So @DbName, @dbname and @dbName would all be different variables. This is what caused my problem today, I had some stored procedures that I wrote and I guess I forgot the exact case of my variable throughout the code where I used it 20 something times. I declared the variable as @DbName varchar(50) and then called it in one place as @DbName and another as @dbname and I got back an error saying that I needed to declare my variable. The same thing had happened with column names and trying to query a database name. I fixed the case-sensitivity issues and got everything working.

What is the point of this? Choose your collation carefully when building your SQL Server and designing your databases. Case-sensitive collation will force every query, variable, columns, etc to be the exact case of whatever data you're dealing with and allow multiple variations of the same variables to be specified. Case-insensitive will allow variables and queries to be written without regard to the case of the query, columns and variables as well as ensuring that a variable name can only be declared and used once.

No comments:

Post a Comment