Tuesday, December 31, 2013

What Matters Most Accredited Knowledge or Time Served ?

   I want to actually apologize for the layoff as well as the general vagueness of some of the posts. I guess I am more the cowboy then the surgeon after all.  I can't completely dismiss the layoff on the holidays and the continuing job search. I'll try to, but that isn't the truth.
     I have added reading and studying for some certifications to my daily routine. It has been very rewarding in more ways then one. If one is to weigh qualifications (i.e. certifications , school and such)  verse professional experience, real world experience should always win.
     I was certified quite sufficiently before having my first I.T. position. I will never dismiss certifications. Approximately ten years later, I know certifications are a glorified rubber stamp that is supposed to indicate you could build the best mousetrap or survive the burning of Rome.
     I learn every time I read. I learn from re-reading. I can attribute my ability to navigate through some of the outages I have seen in the last five or six years from that reading. However there are several things a good professional does that the certs don't test for. There are things that the good professionals do even in the heat of failing systems or unexplained application issues.
     In my first position I was given documentation for the processes I inherited. I was trained in those processes . Then as I maintained and further develop those processes, I was in charge of maintaining those processes and their documentation.
      My trainer gave me the understanding to expect a system or a process to fail. To prepare for an unexpected outcome. Simple making copies of files or knowing where to restart a failed process. Or knowing how to restore data to resume running a process.  
    If this kind of professional logic was included in your texts they might triple in size and they might finally be worth some of those price tags. I don't feel many, if any, books really teach you how to be a better all around I.T. professional. There are endless books that will teach in great detail about all kinds of amazing technologies. There are design books and other reference books.
   It is your work practices and how you apply them routinely that will best serve you when the chips are down. Be it a deadline for an application or a system failure of any nature.
  I recently experienced an individual who muscled through his I.T. career with a great deal of effort. He appeared to be a very hard worker . He spoke very confidently in his plans . So much so that he seemed knowledgeable.  I now feel like he was actually relying on a few tricks to get by and was not using best practices and textbook processes in many cases. I feel that some earlier independence and some false success led him to believe his practices were correct. It is this kind of negative experience that contradicts the previous thought that work experience trumps education in the I.T. field.
    A bad manager from my past had a very narrow definition of what "I.T. work" was supposed to mean. I didn't agree with him, but i have grown to see his point. I.T. work isn't a craftsman job.  A truly talented professional will deliver a quality product. It is not however like having sculpted for years and being able to make great pieces of art. It is like having a measurable scientific understanding of how things behave based on circumstance.      
     In short a good professional experience will always teach you more than a solid reading of the best book. The common ground is in the quality of the resource. I have had the luxury of working with quality professionals whose technical ambitions rival my own in every way. I love to read. I have chosen great texts. I am always willing to learn . I have seasoned all of this "book knowledge" with years of real hands on experience.
    I find it helpful to ask a lot of questions . I even ask questions I technically know an answer for , but I want to know my peer's insight. If your peers are patient enough, you can even play devil's advocate and bounce the different ideas off of each other. Good professionals don't have to share the same processes. They just have to be able to clinically justify why they took their approach.

Wednesday, December 4, 2013

Database Optimization : My Practices versus Best Practice

    "Best Practice" is more than a buzzword. It gets thrown around like one. You can google it but my take on best practice is its the actions that under normal circumstances will allow for best performance and disaster recovery preparedness. We should all aspire to build our environments in normal circumstances and be prepared for every possible setback that could or will eventually occur.
    When it comes to SQL Server normal circumstances can be very expensive and a lot of companies aren't on-board with these costs. We are going to try and imagine that isn't a concern . It may mean however you have to use judgement if following these practices.
      The first thing I look at is file configuration. It is my professional observation that sectioning Best practice says the system , temp, data and log files should all have separate drives. These drives should be separate from the O.S. Drive as well. If you cant have individual drives your priority should be to place the tempdb files on their own drives. The excessive writes caused by many applications create reason to isolate the I/O  requirements for these files.
      Second the data and the log files should be separated to the own drives. I believe the files should have no less than 15% unallocated space in them. So you have room on the drives for when the files fill and grow . When that occurs you should be able to manually grow the file to get the file back to an optimal unallocated percentage.
     I prefer to use multiple data files on any database over 10GB. This isnt really my rule of thumb. Its just what I have done in the past dealing with databases of all sizes. There are some thoughts about the number of files being relative to the number of C.P.U.. I do  follow this with regard to the tempdb, but I dont  follow this with database files. I factor in how many databases are on the server. I consider how which databases are more active then others. I believe the point of matching C.P.U. to file count by a ratio is to better manage the processes.    If a query needs to read data from the file it needs a C.P.U. to do some of that work.
   You may want to consider maintainability when you decide how many data files to make. It is easy to get carried away cutting a large database into many small pieces. If you need to do regular restores of a database with a great number of files it is more tedious than a database of only a few files.
   Some people might even be questioning the need to break the database into multiple files.  I take the stance that SQL Server needs how to properly identify where its data is located. If give it smaller files to find the data in you are intuitive lower the I/O needed to complete your queries.


Wednesday, November 27, 2013


     On my short list of SQL Server configurations that need to be set specifically to your servers needs is MAXDOP. It is the maximum number of C.P.U. that a query can use if it is going to cost more than a few seconds.
      The degree of  C.P.U. parallelism allowed by the server is often looked at like a turbo button for some of your system's heavy lifting.It can also be perceived as a freeway that has mile long backups at rush hour.
  While I have faith in this concept, many of my specific experiences have eventually whittled down to a MAXDOP of 1. This is effectively the off switch for this feature. Out of the box SQL Server set MAXDOP to zero, this is the full throttle setting. It allows any long running query to use all available C.P.U..
   It needs to be noted that SQL Server uses the configured number of C.P.U. when the cost threshold is passed. The default setting for cost threshold is 5 seconds. You need to consider what percentage of your queries run long enough to work using extra C.P.U..
   You have the tools to determine how long any query in your system takes to run. The surgeons out there will figure out what percentage of the queries take how many seconds to run. The cowboys will either estimate how many small transactions are happening versus the total time spent running large queries.
   You may read that  MAXDOP should be set to half the number of C.P.U up to 8. The question here is how likely is it that multiple long running queries will be running on the server at the same time. Too many simultaneous long running queries will cause excessive  CXPACKET or SOS_SCHEDULER_YIELD waits. You will then have to decide if you can raise the cost or if you have to lower the MAXDOP setting. Let the trial and error begin.
   The easiest place to make use of MAXDOP at its best is on your data warehouses and report servers. It is here that the processes are more likely scheduled or in sequence. Heavy lifting won't collide with other long running queries.
    The best advice when making any configuration change is to make small changes and carefully monitor the outcome. Best of Luck.

Wednesday, November 20, 2013

Database Troubleshooting: The playbook for outside the box

     Some time back  a developer from one of our products was brought into help resolve a locking issue we were experiencing. I remember him perusing through much of the server settings and muttering they were just about all default.
     It needs to be said for the 6 or so databases. Only one of which that was larger than 100 GB. This database was the collected data of 15 years of using one version of the application or another. The server had 128GB of RAM and 32 CPU.   I can confidently and callously say that even just out of the box , my server was more than able to handle anything that was asked of it.
      Most of the times we think about database and application issues as long running queries. Occasionally a query seems long running or an application freezes up because another call is blocking it. It seems logical that the offending query is itself long running. We all have our ways of identifying the offender. From the simplest Activity Monitor to the more involved D.M.V queries. My preferred method is something I modified from a Pinal Dave SqlAuthority article. I am a big fan of his articles and many of my searches often refer back to his articles.
      On this particular occasion the block was showing a query completely unrelated to the growing number of blocked users.  All of the users were attempting to remove purchase order request lines created by orders that were no longer needed. The blocker was inactive in the application. Another tool in your immediate reach must be sp_lock.  I created a script that passed the parent blocker to the sp_lock call. This call showed me in no uncertain terms that the inactive lead blocker was holding a complete lock of the PO Request  table. This is what was preventing the other users from completing their order modifications. The application was trying to delete records from the table.
       I was convinced there was some connection being left open or some transactions not being committed. The developer did not agree.We were the only client complaining about this issue. Maybe we were the only client mismanaging our p.o. requests.
       "Maybe" isn't an I.T. word. That is perhaps the most memorably sentiment of the worst I.T. Manager I ever had the displeasure of reporting to.
        Some months later the software company would quietly in the back pages of a release note mention an issue with an open dataset issue in one of the forms. This was as close to a confession as I was going to get.
   For the interim I had to lessen the impact of the blocking . I achieved this by removing the P.O. request lines proactively using other considerations. The less often someone confirming an order had request lines to delete the less likely it was to be blocked by another user who was inactive .
   I considered changing flags so the server would not escalate locks on memory contention and by number of locks. In the end I did not , I felt that I hadn't justified that this would have a real impact on the problem. I did change my scheduled restart of the server from a bi-weekly event to a weekly event.
   A restart to me is like punting. You haven't identified the problem. You have just thrown the good away with the bad. If there is a point to this pseudo-rant, it is that even when your resources have pointed you in pretty certain terms to the cause. If the other parties involved,  outside vendors or not, plead innocence you still have to MacGyver the situation.

Wednesday, November 13, 2013

Is it the Hardware or the Application ?

     As I am currently job hunting , I am being asked a small collection of baseline questions. The kind of technical questions that separate the real from the would be real. One of my favorites is how to tell whether a bottleneck is in the application or hardware resources.
      The interviewer is looking for what specific strategies you would take to determine the source. How do you prove to a developer that a slowness or locking issue is caused by the application ? How do you show your system administrator that the server memory or drive specifications are not cutting it with the high transaction order entry and billing systems ?
     I assume the process in question is some sort of data call. You need to find the latency. I like to break it down; either something is doing more than it should or something is just taking longer than it should to do it. The client makes the request. The application talks to the database. The database responds to the application. The application notifies the client. Hopefully your set up is comparable or this is merely a simplification of it.
      My network guys are infamous for saying "It's never the network.". I usually take that at face value. Looking to disprove it only after all other possibilities have been exhausted.
      The hardware guys have one of two opinions. The more common is denial. They'll tell you the hardware is more than sufficient to handle everything that is asked of it. The second opinion is an admission of guilt. Of course the hardware is insufficient; Its old and wasn't spec'd out right.
      The database guy , that's my normal role, is the most suspicious. If he can acquit his hardware of guilt , he still has a few other possible problems. Server settings such as memory allocation, File sizes and growth settings, database maintenance and poor indexing are all things that can cause performance issues. He needs to make sure the data call is only asking for what is truly needed at the time. He may have a lot of ground to cover to find the true source of the issue.
      The developer will also normally deny any responsibility. The internal developer is trying to protect the perceived quality of his work , and therefore his job. The vendor's developers are protecting the company product. You will be the only customer with this issue and they will not be able to recreate it.
      Getting back to the fact that this is an interview question, you can assume that the issue is happening under close to ideal conditions. You may want to clarify that with the interviewer, but the question makes more sense when the obvious has been confirmed.
      The first step is to find where the wait(s) are in the process. Running the process and a profiler will separate the application waits from the database waits.
      If the wait is determined to be in the database, then the activity monitor will show the kinds of waits that are impacting the process. The wait will narrow down is the issue is blocking , file contention , poorly written sql or a number of other causes.
      If the wait appears to be in the application you need to decide if it is hardware or code. The simplest way to look at it if the hardware resources are above recommended percentages then the issue can be addressed there . If there is no such elevated use of CPU or memory than the code is likely an issue.
      This may be too general to quote as an answer in an interview. Hopefully you can expand on it some and make the point. I hope to revisit some of these ideas in the future.