Ed Elliott's blog

SSDT: Unable to connect to master or target server.

  • Posted on: 12 June 2017
  • By: Ed Elliott

error in sssdt: Unable to connect to master or target server. the server displayed isn't the server but the database

Every now and then I come across this error in SSDT, normally when trying to publish and the odd thing is that the description never matches the actual cause (if you desperate for the cause it is because you can't connect). The thing I like about the description is the way it tries to tell you what is wrong and what server you are connecting to but it fails at both and instead tells you about an unrelated error and a database name instead of a server name.

What we have is:

"Unable to connect to master or target server '{0}'. You must have a user with the same password in master or target server '{0}'.\"

Twice it tries to tell you the server name but both time it actually tells you the database name. I thought I would dig into it a little with reflector to try and see where the error is coming from and whether or not it would ever show the servername. So in reflector I found the error and found where it was used. If we look in, what is surely to be everyone's favourite ssdt dll Microsoft.Data.Tools.Schema.Sql.dll, we can see that the error message is used in SqlDeploymentEndpointServer.OnInit and we have something like:


catch (ExtensibilityException exception)
{
Tracer.TraceException(TraceEventType.Verbose, TraceId.CoreServices, exception, "Error loading DSP families");
throw new DeploymentFailedException(string.Format(CultureInfo.CurrentCulture, DeploymentResources.InvalidServerEndpoint, new object[] { targetDBName }), exception);
}

they are indeed passing targetDBName into the InvalidServerEndpoint error message so yes indeed the error message will only ever show the database name.

I had a quick look at what can cause this and it is something to do with opening the SqlConnection which is wrapped in lots of retry logic that is different for Azure compared to other types of SQL - lots of interesting stuff maybe for another post but basically SSDT wasn't able to open a connection - check the server name, ports, database etc (i.e. maybe your default database is not available to that user), connect via ssms then when it works there come back to ssdt.

Footnote, I did think about raising a connect but then couldn't be bothered, if anyone does i'll be happy to vote for it!

SSDT DevPack + Visual Studio 2017

  • Posted on: 8 May 2017
  • By: Ed Elliott

I have upgraded the ssdt dev pack to support visual studio 2017 and fixed a couple of minor annoyances and have started to think about how to improve it going forward.

The first feature of sorts is the ability to clear the connection used by the quick deployer without having to restart visual studio.

Secondly I am pretty convinced that the main thing people use it for is the quick deploy, the tSQLt test builder and I use the keyword formatters so I have moved everything else to underneath a menu item saying "deprecated" - if anyone really wants one of those features then let me know and I will keep it but I will likely remove them at some point.

I am planning on bringing SQLCover into it at some point and I haven't really been using the mergeui part, I think the better thing is to use sp_generate_merge to generate a merge statement it is much more reliable. If you have a simple table then MergeUi might still be useful to you.

I will publish it to the market place so updates happen automatically, if you want a copy of it go to:

https://visualstudiogallery.msdn.microsoft.com/435e7238-0e64-4667-8980-5...

Before I published it I realised that when I wrote the code (a couple of years ago!) that I had taken a couple of shortcuts, one was to always use the 120 version of the parsers instead of whatever the project was set to so I decided to fix that first and then publish - looking over old code is never good ha ha.

It is now published so you should be able to install for vs 2017 and existing installs on 2015 will be upgraded (hope it doesn't break for anyone, there are lots more users than when I last published it!)

SSDT Dev in Visual Studio Code

  • Posted on: 27 April 2017
  • By: Ed Elliott

I have been quite interested by vs code and have been using it more and more recently. I use it for all my GO (#golang FTW) work and also powershell and I have been toying with the sql tools team's sql extension which is great. For a long time I have thought about bringing the SSDT experience to other IDE's like Jetbrains IntelliJ but because I have been using vscode quite a lot recently and separately I have been doing more and more javascript and typescript I thought it would be interesting to see how hard it would be to write a vscode extension that lets me build dacpac's.

The general goals of this are not to re-created the ssdt experience in visual studio but to provide a lighter, faster way of developing sql code, if I can have an extension that:

  • is fast
  • is light weight - memory is important for dev machines and 2 gb for a large db project is limiting
  • gives us the important things like pre,post deploy scripts, refactoring and the ability to generate dacpac

I am not really interested in providing ui's like the schema compare - for that use SSDT or spend some money on the Redgate tools.

I am also not interested in replacing what the sql tools team are doing, I am happy to leave them to do the harder, important but less interesting things to me like t-sql formatting so with that in mind I have started a new project that is very hacky at the moment, more an experiment to see if it will work but a vs code extension that builds dacpacs:

https://github.com/GoEddie/vscode-ssdt/

This is basically just a wrapper around the DacFx so there shouldn't be anything too hard and also because it is windows only for now (until the DacFx is cross platform it will only ever be windows, but I hold out hope for cross platform DacFx one day!).

This works similarly to the sql tools team extension in that there is a .net app that is called by the vs code extension (typescript running on node.js) so if you wanted to try this, download the repo, run the SSDTWrap exe (not under a debugger or you will face t-sql parsing first chance exception performance hell). Then in vs code open the folder "src\s2" and the "extension.ts" file and F5 - this will open a new vs code window - open a folder with your .sql files and it will create a t-sql model and report any errors.

If you do ctrl+shift+p to open the command pallette and then do "build dacpac" it will generate a dacpac for you from the sql files. You will need to put this ssdt.json file in the root of the directory you open in vscode:


{
"outputFile": "c:\\dev\\abc.dacpac",
"SqlServerVersion": "Sql130",
"references":[
{
"type": "same",
"path": "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\Common7\\IDE\\Extensions\\Microsoft\\SQLDB\\Extensions\\SqlServer\\140\\SQLSchemas\\master.dacpac"
}
],
"PreScript": "",
"PostScript": "",
"RefactorLog": "",
"ignoreFilter": "*script*",
"PublishProfilePath": "C:\\dev\\Nested2\\Nested2.publish.xml"
}

It doesn't really support a lot of things at the moment but I will add in the support needed to build a dacpac including refactorlog.xml, pre/post deploy scripts and references that we all know and love.

I tested with a folder of 2000 procedures, I tried testing 10,000 but I couldn't get ssdt to load them into a project without crashing (on a top of the range i7, 32gb ssd etc laptop) - in the end I settled for 2000 procs and to build the dacpac the times were:

App time (milliseconds)
Visual Studio / SSDT 5630
VS Code 2051

so as well as handling larger projects it is faster as well, a small project (one proc/table) was about 17 seconds to build the dacpac.

Anyway, it is all a bit of fun and pretty hacky at the moment but I like using vs code anyway and am finding it much more light weight than visual studio so will likely invest some more time in it.

If you feel like trying it out, good luck :)

Updating TSqlModels (DacFx)

  • Posted on: 26 April 2017
  • By: Ed Elliott

This one is for the DacFx nuts out there, it can't be a very big club but judging from the occasional emails I get about it, the quality is very high ha ha.

If you have a TSqlModel and you want to make a change to it, you have a couple of choices:

- Create a new model and copy over everything
- Use AddOrUpdateScript

AddOrUpdate can only update scripts that you add (i.e. Know the name of) so if you get the model from a dacpac you are out of luck.

I recently wanted to remove a file from a model I created so what I ended doing was to re-add a script with this contents:


--

I didn't try an empty file but this let me remove objects from the model quite nicely.

On another note I found something quite interesting that I missed when building a model.

When you parse a script you get a list of parse errors and the script object, you can add the script object to the model and call validate and even though you got parse errors the model validate might still pass (or parse waa haa haa haaaa).

The total errors are script parse errors plus model validation errors which seems obvious now but I missed it.

Hope it helps someone.

Happy DacFx'ing everyone!

Tags: 

My SQL Server Development Team Maturity Levels

  • Posted on: 24 April 2017
  • By: Ed Elliott

A teams maturity shows in its choice of tools.

I have seen quite a few different development teams in wildly different environments and the single fact that really stands out is that you can tell how good a team is by the tools that they use. It isn't always the specific choice of which tools they use, although that can be important, it is the fact that they evaluate and chose to either use or ignore new tools.

This is basically my personal maturity model for sql server developers, I think it is quite important because it is a measure of effectiveness of a team. It should be pointed out that some teams have no need to be effective whereas other teams are vital to how an organisation runs.

If we take a few examples, the first shows where a team has no need to be mature at all:

Team one, no one cares

Team one supports a vendor supplied application and the database is SQL Server, the vendor supplies an upgrade script to run every 3 months and the team is not allowed to make any changes to the application. In this scenario, there isn't really any benefit in the customer having the database in source control - any problems are dealt with by the vendor and any scripts can equally be run in SSMS or sqlcmd.exe or a custom application. Even though the application is critical, the vendor supplies all support for the application and the team just need to keep it updated.

The second one, is a team that is important.

Team two, everyone cares

Team two develop and support an in-house electronic medical record application. It was first written in the late 90's and has evolved to a include a number of SQL Server databases and .net services. All the code from the .net services, the databases and the tooling that the developers use is critical and hacking together a release is not going to wash it when a nurse might not be sure if something needs to be adminstered to a patient or not.

Team three, not critical but the team care

Team three develop a website that generates revenue for the company. Revenue is the most important factor but the team has good people in it who care and want to do the right thing for the company, not at the loss of revenue but with the idea to increase revenue and deployments.

Maturity Levels

OK so this is pretty simple, we have these levels:

  • Low
  • Medium
  • High

Wow. Just WOW

That is an amazing list, how did you come up with it? Did it come from some phd study on the effectiveness of lists in the internet age? No.

So a little more detail...

Low

The low maturity team is one that uses basic tools or no tools at all. They probably do all their development in SSMS without any of the refactoring or intellisense helpers that are available. If they do use source control then their database is not likly deployable in any form just purely from source control.

This could well be perfect for team one, there is literally no need for them to spend any time on anything that other teams might require as an absolute basic. There is nothing wrong with this if the development and deployment is genuinely not important.

If you have a database and you don't care about it then the best maturity level for you is low. Don't feel bad about it, it is awesome in its own way and we love awesome.

Medium

This is where it starts to get a little bit exciting, the team probably has some tooling - they might have sql prompt for ssms or use another ide like jetbrains datagrip. For deployments they could use a tool such as redgate readyroll or perhaps they have written their own deployment tool.

Why is ready roll in the Medium maturity level?

The main reason is that in a medium maturity team deployments, both creating and managaging them are critical to how the team develops and deploys changes. Readyroll helps teams to generate deployments, it is a tool to manage changes and deployments.

Thinking about deployments and having deploying changes is not necessarily a bad thing - if you are not team one then you should absolutely be thinking about deployments and if you did have a low maturity it is a great step to getting more mature in your SQL Server development.

Why is datagrip in the Medium maturity level?

Ok, since you ask - if it was another database then I would put it in the advanced section but the tooling for SQL is great so unfortuntly it goes into Medium. If there was a genuine reason why a team used it I would be tempted to throw them a High maturity level bone but no guarantees.

High

For a high maturity team I want to see good IDE's being used - so SSDT or datagrip or something similar. It doesn't have to be SSDT but if not then there needs to be a valid reason why. I also want to see code checked in (not changes checked in) and then the code being built, tested and deployed either straight to production or possibly prepared to be deployed later. If it is to be deployed later I want to see a plan in place that will get them to continuous deployment.

Where do teams two and three fit?

I would hope between Medium and High - if not you have to ask questions as to why not. It is 2017 and there are pleanty of resources available out there.

Show me a chart

This is really begging for a cool matrix helping people show what all the criteria are (there must be more of course) and where they fit, but hey, this is a low-medium maurity blog so maybe one day in the future.

SQL Server Continuous Deployment "In a Box"

  • Posted on: 7 March 2017
  • By: Ed Elliott

What is this?

Well if you read the name aloud "SQL Server Continuous Deployment in a box" then, if I have done my work correctly choosing the title for the blog, give a hint :)

what is the big idea?

There is really some great tooling for SQL Server - second to none really when it comes to RDBMS's and setting up Continuous Deployment pipelines is actually pretty simple once you know which parts to plug together. I even did it in 55 minutes once (https://www.youtube.com/watch?v=9YJQTx3bPek).

What I wanted to do was to allow people to spin up a vm, install ssdt (or visual studio with ssdt), install a local dev instance of SQL Server (or use localdb), run a script and add some parameters and have everything they need to be able to make changes to some code and have that code automatically deployed to a production database.

Now a little word about that word "production", you could set this up to point to your production database but what I would suggest for the demo is that you use a copy of your production database or something which you will call "production" - the tools here can all be used in a real life setup but you wouldn't normally host everything on your development machine.

How does it work?

The idea is that anyone who can download from the internet can do this, so setup the pre-requisites (ssdt and sql) and then either clone the repo (https://github.com/GoEddie/SQLServer-Continuous-Deployment-In-A-Box/) or download the latest zip from:

https://github.com/GoEddie/SQLServer-Continuous-Deployment-In-A-Box/arch...

Note to get the zip, you don't need to use git or sign up for an account or anything (other than clicking the link)

Once you get that then extract the folder, open powershell as an administrator, change to the src folder and run these two little commands:

Unblock-File *.ps1
.\ContinuousDeploymentFTW.ps1

What does this do?

Unblock-File *.ps1 - removes a flag that windows puts on files to stop them being run if they have been downloaded over the internet.
.\ContinuousDeploymentFTW.ps1 - runs the install script which actually:

  • Downloads chocolatey
  • Installs git
  • Installs Jenkins 2
  • Guides you how to configure Jenkins
  • Creates a local git repo
  • Creates a SSDT project which is configured with a test project and ssdt and all the references that normally cause people problems
  • Creates a local Jenkins build which monitors your local git repo for changes
  • When code is checked into the repo, the Jenkins job jumps into action and...

If you check into the default branch "master" then Jenkins:

  • Builds the SSDT project
  • Deploys the project to the unit test database
  • Runs the tSQLt unit tests
  • Generates a deployment script for the "production" database

and what you have there is continuous delivery in a box, now I know that isn't what you were sold by the title but I wanted to show a couple of different approaches to this so if you use git to create a release branch and check-in on it by changing to the directory with the SSDT project in powershell and doing:

git checkout -b release

Make a change and then...

git add .

git commit -m "a change that will go straight to production \o/"

You will see that a "Release" jenkins build is created automatically because the job we set up initially is a "Jenkins Multi-branch pipeline" - don't worry about that but what you see is Jenins:

  • Builds the SSDT project
  • Deploys the project to the unit test database
  • Runs the tSQLt unit tests
  • Deploys the SSDT project to the "production" database

Nice hey?

Why the choice of technology?

SSDT - this doesn't need ssdtm you could do this with readyroll, dbup etc anything
Git - aren't most people moving to git nowadays?
Jenkins 2 - for the multi-branch pipelines which means it automatically creates builds from the Jenkinsfile which is checked into source control

Sounds hard to setup?

It isn't all you need to do is configure Jenkins, create a user and give my script the username and token and also the connection details to the unit test and production databases. If you like when you get the SSDT project you can import from your production database which will then be deployed to your unit test database, or you can leave it empty, or add a couple of objects - whatever suits you!

Prerequisites

I would create a VM, install SSDT or Visual Studio with ssdt in (2015 or 2017), install a local SQL Server 2008+ and restore a copy of your production database that should be it.

I made a video to show the awesoness of all of this:

https://the.agilesql.club/assets/videos/SQLCDINABOX.mp4

I made the video to see how much fun it was to make videos, it was very fun but this will be the only one ;)

Enjoy and good luck!

ScriptDom parsing and NoViableAltExceptions

  • Posted on: 2 March 2017
  • By: Ed Elliott

If you have ever tried to debug a program that used the TSql Script Dom to parse some T-SQL you will know that the process is extremely slow and this is due to the volume of NoViableAltExceptions (and others) that are thrown and then caught. Because these are first chance exceptions they are being handled and it is the way that the script dom interacts with Antlr and the Lexer that they use. When you debug a program what happens is you have two processes, process one is the debuger, this starts (or attaches) to process two, the debugee.

The debugger calls a windows function WaitForDebugEvent typically in a "while(true)" loop (everyone should write a windows debugger at some point in their lives you learn so much, in fact put down ssms and go write your first debugger loop: https://msdn.microsoft.com/en-us/library/windows/desktop/ms681675(v=vs.85).aspx). The debugee app is then run and when something interesting like an exception or a dll is loaded/unloaded the debuggee is paused (i.e. all threads stopped), then WaitForDebugEvent returns and the debugger can look at the child process and either do something or call WaitForDebugEvent again. Even if the debugger doesn't care about the exceptions the debugee is still paused and when you parse T-SQL under a debugger, even if you tell Visual Studio to ignore the exceptions, the debugee is still paused for every exception just so Visual Studio (or your home baked debugger) can decide that it wants to ignore that exception and the debugee is started up again.

What this means for an app that throws lots of first chance exceptions is a constant start, stop, start, stop which is so so painful for performance - it is basically impossible to debug a TSql Script Dom parse on a large project, I typically debug a project with like one table and one proc and hope it gives me everything I need or do other tricks like letting the parsing happen without a debugger attached then attach a debugger at the right point after the parsing has happened but then again I don't have to debug the TSql Lexer's!

So where is this leading?

I was wondering what effect these first chance exceptions had on T-SQL and even in normal operations where we don't have a debugger attached, is there something we can do to speed up the processing?

The first thing I wanted to do was to try to reproduce a NoViableAltException, I kind of thought it would take me a few goes but actually the first statement I wrote caused one:

"select 1;"

This got me curious so I tried just:

"select 1"

Guess what? no NoViableAltException the second time - this didn't look good, should we remove all the semi-colon's from our code (spoiler no!).

Ok so we have a reproducable query that causes a first chance exception, what if we parse this like 1000 times and see the times and then another 1000 times with the semi-colon replaced with a space (so it is the same length)?

Guess what? The processing without the semi-colon took just over half the time of the queries with semi-colons, the average time to process a small query with a semi-colon in took 700ms and the query without the semi-colon took 420ms so much faster but who cares about 300 milli seconds? it is less than 1 second and really won't make much difference in the overall processing time to publish a dacpac.

I thought I would just have one more go at validating a real life(ish) database so I grabbed the world wide importers database and scriptied out the objects and broke it into batches, splitting on GO and either leaving semi-colons or removing all semi-colons - when I had semi-colons in the time it took to process was 620 ms and there were 2403 first chance exceptions. The second run without semi-colons which would likely create invalid sql in some cases - took 550 ms and there were still 1323 first chance exceptions, I think if we could get rid of all teh first chance exceptions the processing would be much faster but ho hum - to handle the first chance exceptions you just need a fast CPU and not to be a process that is being debugged.

SqlPackage Deploy Performance - IgnoreXX are not your friend!

  • Posted on: 2 March 2017
  • By: Ed Elliott

Following on from yesterdays blog I was wondering about the comparison of objects that were the same and how the IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements flags affected the comparison. To be fair I was only interested in IgnoreWhitespace but actually it turns out that those four are very closely related.

When the deploy happens, where a script in the source and target are compared the process is:

  • 1. Loads of things we will skip
  • 2. Any cmd variables in the scripts are replaced with their appropriate values
  • 3. If both the scripts are null - the comparison returns true. This has to be the best for performance but the worse for functionality ;)
  • 4. If one script is null but not the other then the comparison returns false. This actually has to be the best for comparison performance but worse for deploy performance!
  • 5. We then get a good old fashioned String.Equals, the standard .net compare goes: if both strings are not null and the lengths are the same do a check on each byte in the strings
  • 6. If the strings are equal we have a match, happy days no more action required

It is what happens if the strings do not match that it starts to get a bit more interesting, if the strings are not equal and any of those four ignore options are True then we then fall down into doing a further comparison but after the scripts have been normalized using the script dom and antlr which is an expensive operation in itself (this also happens to be my next topic!).

Once the normalization has been done we end up in the actual compare which goes like this:

  • 1. Turn the script into a stream of tokens
  • 2. If the token is a comment and ignore comments is set, skip it
  • 3. If the token is whitespace and ignore whitespace is set, skip it
  • 4. If the token is a semi-colon and ignore semi-colon's is set, skip it
  • 5. Then compare the tokens which itself does things like use IgnoreKeywordCasing and removes quotes around quoted identifiers - it isn't a straightforward String.Equals
  • 6. If any of the tokens don't match then it is a failure and the script needs to be changed

So what?

So blunt. Anyway, basically what this means is that the default options in the sqlpackage.exe are set to allow things like different cased keywords and whitespace and to allow that we end up taking longer to do deployments where we actually make use of the default features.

huh?

If you have a database with lots of code and you have the code in SSDT but you do things like change the comments when you deploy and rely on IgnoreComments (this is a real life secenario I have seen, someone adding a custom comment header) then you will have slower deployments and as slower deployments are the opposite of what we want you should:

  • Have the same code in your database as you have in your project
  • Have the same code, including the same case of keywords, comments and whitespace in your database that you have in your project
  • Disable the defaults and set IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements all to false

What effect does it have?

If your database a project code are exactly the same, then no effect you neither gain nor lose anything.

If your database and code are different by comments, case, semi-colons etc and you have lots of files that are different then you will gain quite a bit. On my machine here I created a database with 1,000 stored procedures like "select '----'" (I used replicate to make it large) I then imported the procs into SSDT and added a space between the select and the text and did a deploy using sqlpackage (in fact I did a few to get an average time), with the default IgnoreWhitespace=true the deploy took about 24 seconds (remember this is on a machine following yesterdays recommendations. lots of memory, fast CPU and SSD) - when I removed the defaults and set them to false - firstly the deploy took 34 seconds because naturally it had to deploy the procs then re-running it took around 17 seconds - about 7 seconds from a 24 second deploy which i'll take.

The thing that you will really gain is that your project code and database will be the same which should really be the end goal, if you can honestly say that you have to have:

  • Different whitespace
  • Different keyword casing
  • Different semi-colons
  • Different comments

I would be inclined to find out why as it sounds like an interesting project :)

SSDT Deploy / Publish Performance

  • Posted on: 1 March 2017
  • By: Ed Elliott

Publishing dacpac's is a little bit of a pain when you have multiple databases, it can easily start to take minutes to hours to deploy changes depending on how many databases and the size of those databases. I wanted to understand more about the publish process and what we can do to speed it up as much as possible so I did some digging and this is a randomish post about how it all works and what we can do to make it faster.

Process Overview

Roughly speaking the process for deploying dacpacs is:

  • 1. Open dacpac (zip file)
  • 2. Open database
  • 3. Compare dacpac to database
  • 4. Come up with a list of changes
  • 5. Generate the T-SQL for the changes
  • 6. Run the deploy script

1. Open the dacpac (zip file)

The dacpac contains the xml representation of the model of the database and the original source code. To be able to deploy changes we need the original source code - the model contains the definitions of the objects such as which columns the table has and the parameters a stored procedure has but not the body of the stored procedure including comments.

The dacpac is a zip file, this means to read from it the deployment must create a file handle to the dacpac and uncompress anything it wants to read (or compress anything it wants to write), the larger the dacpac logically the longer it will take to decompress and also read from disk so there are our first two important things to note, the deploy will need:

  • CPU to decompress the contents of the dacpac
  • Fast disk as it has to read from disk

When you run the publish, by default the model is read from disk into memory, however, for very large dacpac's you might find that you don't have enough memory for it - a windows system with low memory is bad, two things happen:

  • The paging file is used
  • Applications are told to free memory

This is painful for an application and when you get paging, especially on non-ssd drives, you are going to find everything about the whole system slow.

If you have a large model in your dacpac and you don't have much memory then you will probably need to load the model from disk rather than into memory, but you best make sure you have an ssd to run it from!

For more details on how to do that see: https://redphoenix.me/2013/06/14/why-is-sqlpackage-using-all-the-build-s... (Richie Lee FTW!)

2. Open database

The database contains all sorts of goodies like the version that will be deployed to, the database settings and the definitions that will be compared to the dacpac to see what changes need to be made. If your SQL Server responds slowly then the publish will be affected by that so make sure your SQL Server box is up to scratch and expect slower deploy times on resource limited instances.

3. Compare dacpac to database

I had always wondered how they compared the two different objects so I bust out reflector and used it to have a poke about. What happens is a model is build of the source and target - a model is a representation of each object, it has its name, its properties and relationships to other objects for example a table has a relationship to its columns and each column has a relationship to its type (not a property of the type).

If you think about this it means, for each deploy we actually need to store the model of both the database and the dacpac so doubling the size requirements - this isn't to say an exact double of the size of the source code becauuse what it compares is things like the object properties and relationships but also the tokens that make up the body of the object, so there are a lot more objects that are created around the code. In short if your source code is 100 mb you will need some multiple of 2 * 100mb to upgrade an existing database - I am not sure if there are any figures to show what that multiple is but if you do a publish and you run low on memory or you get lots of garbage collection in your deploy then consider either storing the model on disk or throwing in some more memory.

There are a couple of interesting extra things that happen, in particular to do with the size of code in objects and how much to store in memory, to store in memory compressed or to write to disk. If for example you had a stored proc that was less that 500 characters long, the publish will store it in memory.

If the string representation of a procedure was over 500 characters but less than 16 thousand characters the it will be stored in memory but compressed first.

Finally if the stored proc (I am using procs as a example but I think it is any code unit minus the create + name so everything after "as") is over 16 thousand characters then a temp file is generated in your user temp folder and the contents are written to that.

So three different behaviours depending on the size of your objects:

  • Under 500 chars - you need memory
  • Over 500 chars, under 16,000 you need memory and CPU
  • Over 16,000 chars you need memory and fast disk for your temp folder

I created three demo databases and deployed them over and over, the first contains a couple of hundred procedures with slightly under 500 characters, the second with slightly over 500 characters and the last with slightly over 16,000 characters.

When I ran the publish this is what they looked like:

Under 500 char procs:


under 500 chars procs, fastest to deploy

Slightly over 500 char procs:

just over 500 chars procs, similar to under 500 chars but includes some time to compress some strings

What we can see here is that the under 500 char procs, takes the least amount of time but the interesting thing is the database with slightly over 500 chars procs has a little bit of extra umpf from the CPU - I have showed this by using the pink circle - both deploys follow the same pattern for sqlpackage.exe CPU usage - it goes from around 90% cpu up to 100% and then drops down sharply to around 80% and then back up to 90% but the second project with the slightly larger procs - also has an extra splurge of CPU which I am putting down to the extra compression that those strings are getting - it certainly tallies up, even if it is not scientific :)

Aside from the extra CPU usage they are pretty similar, the black highlighted line is garbage collection and the way this counter works is every time it changes, it is doing some garbage collection - the first version does I think 3 collections and the seconds does 4 which sort of makes sense as once the strings has been compressed the original strings can be thrown away.

It is when we start to look at the database with lots of objects over 16,000 characters long we see some interesting things and some other forces come into play:

over 1600 chars procs, lots of disk activity

The first thing to note is the garbage collection, there are like 8 changes to the line so 8 sets of garbage collection that happen which is expensive for a .net app. We get much more processor usage for a more prolonged time and we really start to see some disk activity, write activity just has one peak while the files are written but reading stays quite high throughout the whole process. It could be that it is sql reading from disk (my demo was from a single ssd with a local sql instance) but we can see after the sql cpu settles down the green disk read line stays high so it is unlikely to be pure sqlserver.exe.

What does this tell us?

Well if you have large objects, multi-thousand line stored procs then you better get yourself some serious ssd's to deploy from. If you have smaller objects then you'll need CPU and memory - don't scrimp here!

4. Come up with a list of changes

So when the properties and tokens in the source / destination object have been compared they are added to a list of modifications - this is fairly straight forward but that is not the final list of modifications because there are lots of options to customize the deployment such as ignoring certain things - what happens next is that there is some post-processing done on the list to remove anything that the options say should be removed.

We then get the opportunity to interrupt the deploy process with a contributor and remove steps as we want - if we do that anything we do is overhead so be warned, be fast in what you do or slow down the whole process!

There are some things like default constraints, these are stored differently in SQL Server to our project - so when a difference is found, as part of the post-processing each constraint is "Normalized" using the script dom and a visitor to find out if the changes are actually different or just appear different - this means that we don't keep getting constraints deployed over and over (a relatively new feature as this was something that used to happen a lot - a great new feature added by the SSDT BTW!) but it does mean for every default constraint we need to go through this normalization using the script dom which isn't exactly lightening fast.

Takeaways from this are that the deploy does a lot of things to help us - if it was a straight string comparison then it would be faster but the deploy options would not be as configurable. The more objects you have, the longer the deploy will take!

5. Generate the T-SQL for the changes

6. Run the deploy script

I'm going to stop this post here - I think these are a bit more straight forward, the more things you have to change, the more time it will take :)

Summary

If the deploy is slow, use perfmon measure the cpu, disk and garbage collection of sqlpackage.exe (or whatever is doing the deploy) - have a think about whether streaming the model from disk would be better and if you are CPU bound add a more powerful CPU, if you are low on memory, add more memory and if you are on an old slow disk - put it on an ssd.

If you are low on CPU you might be tempted to add more CPU's - understand that a single deploy happens in serial so adding more CPU's won't make a deploy quicker if this is the only thing running on a box - if it is fighting for resources then more CPUs may help.

SSDT and Friends - .net meetup video

  • Posted on: 10 February 2017
  • By: Ed Elliott

I did a talk at the london .net meetup if you want to get an overview of what SSDT is and how to get started then I would recommend it:

https://skillsmatter.com/skillscasts/9274-londondot-net-january-meetup

This was aimed at .net developers rather than DBA's so there isn't much talk about "why you should use source control" etc as everyone in the room used source control already :)

Pages