Since getting back from TechEd things have been a bit crazy. The rest of the week there was a blast. I got to meet many more community members and catch some really interesting presentations. I am currently fascinated with the SQL Server 2008 Data Mining tools and am in search of a good problem. I'd like to turn this into a presentation for the local User Group.
One of the other highlights was using the Certification Testing at the conference. After sorting out some issues with the name on my certification, I was able to finally get this logo:
_511.gif)
That was not a pleasant test. The tool just has so many facets to it and it's really impossible to have used all of them. But the test prep senter at the conference and some reading got me a pass on the first try.
I ran into a BizTalk problem today that was inevitable. We interface with PeopleSoft a bunch. PeopleSoft has some very specific Schemas it uses to communicate. We have two seperate applications that make use of the same schema in reading responses posted to the PeopleSoft web service. Well the second of two projects what put into production a few days ago. When I came in this morning, one of them had a suspended message on the receive side of a HTTP Send/Receive adapter:
Microsoft.BizTalk.DefaultPipelines.XMLReceive, Microsoft.BizTalk.DefaultPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Source: "XML disassembler" Receive Port: "rcvFile" URI: "C:\Test\In\*.xml" Reason: Cannot locate document specification because multiple schemas matched the message type
So, as expected, BizTalk could not figure out which schema to resolve the message to as it exists in this Applicaion and another. The default XML Receive Pipieline is not very helpful in this situation as it is just that - default. The solution turns out to be quite simple. Create a new Receive Pipeline and add an XML Disassembler to the Disassemble stage. In the XML Disassembler's properties, add the desired schema from the application to the disassembler's Document Schemas collection. This tells the disassembler to select from this list only. If the list is empty, it will behave the same way as the default XML Receive Pipeline.
Spent a few days battling an error on an HTTP Send Port. We've been having messages suspend when posting data to Siebel. The funny part is that the messages got accross ok. We were only getting the error on the ACK from the send port. The error message on the suspended message was:
The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF
We brough this up to the Siebel guys but they were a little slow moving on the informatio that they were sending us bad responses. After a few days I got nervous that the problem may have not been on their end. So I dragged out NetMon. And here's what I saw:
Frame: + Ethernet: Etype = Internet IP (IPv4) + Ipv4: Next Protocol = TCP, Packet ID = 50895, Total IP Length = 467 + Tcp: Flags=...PA..., SrcPort=HTTP(80), DstPort=1427, Len=427, Seq=2928398986 - 2928399413, Ack=4246831364, Win=64240 (scale factor not found) - Http: Response, HTTP/1.1, Status Code = 204 - Response: ProtocolVersion: HTTP/1.1 StatusCode: 204, No content Reason: No Content Date: Thu, 15 May 2008 20:28:51 GMT Server: Microsoft-IIS/6.0 XPoweredBy: ASP.NET process instance id: 1-1DEHL5 user-agent: Microsoft (R) BizTalk (R) Server 2006 3.0.1.0 datahandlingsubsystem: HRFQADProductInbound transporttype: HTTP expect: 100-continue object id: 1-Z02HX connection: Keep-Alive cache-control: no-cache, must-revalidate, max-age=0 Pragma: no-cache Host: mlbsbltst1vm HeaderEnd: CRLF
So I dragged out RFC 2616 to go over what the HTTP Header should look like. I found that the header field name should be a token and that the definition of a token is:
token = 1*<any CHAR except CTLs or separators> separators = "(" | ")" | "<" | ">" | "@" | "," | ";" | ":" | "\" | <"> | "/" | "[" | "]" | "?" | "=" | "{" | "}" | SP | HT
So, in the aforementioned header the following fields are of interest:
process instance id: 1-1DEHL5
object id: 1-Z02HX
They are bad. BizTalk is right. Working on the Siebel guys to take out these custom header fileds (or make them compliant at the least).
The problem troubleshooting something like this is that BizTalk chokes on the message in the pipeline and it doesn't get captured anywhere. I had to reproduce it in an environment where I could use NetMon to grab the actual data from the wire.
Are you experiencing similar BizTalk issues like the ones I described in previous posts? I've seen some search term hits that indicate you might be. If so, please comment or emailme directly, I'd like to hear about it and compare notes.
I am in the middle of a medium size BI project where we chose Microsoft for ETL with the SSIS component of SQL Server 2005. For various factors, we decided on Cognos 8 for the Cube and Presentation layers. As part of the analysis we took in to account things like cost, Gartner, In-House skill sets and so on. It was a pretty even race for Cognos & MS Performance Point Server (PPS) and we ended up going with Cognos.
Some background information on our Cognos implementation. It came in-house with a product called Agile. So since we were licensed, we went with it for basic reporting needs. Now we're at the point we're we are really looking at BI - time analysis of data, ad hoc analysis, KPIs, and so on. We made an assumption that we could leverage our existing Cognos skill sets into the world of Cognos 8 BI. It wasn't a great bet. We sent some people to training and they took away what most take away from a week long course based on a vendor curriculum (This is not just a Cognos issue, we have a real challenge finding solid training for the Microsoft stuff too).
Now, I was in the same position our Cognos talent was in when I went to work on BizTalk. I had a strong background in the fundamentals of .Net languages and Web development. I went off to take the one week training course (much love to Mark Berry at Dunn Training) and came away with a strong set of basic tools. When I went up against the kind of problems we're hitting in Cognos right now, there was a difference.
Searching for help on Cognos technical issues is really difficult. There is very little out there in the way of web based community. And a lot of what you do find refers to Cognos' KB which is protected by password. I am not sure what the hurdle is to getting the password setup... a call to our account representative and some paperwork. When you're slugging out a technical issue this is not the best customer experience to have.
On the other hand, Microsoft's community is unbelievably rich and returns many hits when searching for answers. BizTalk is a pricy tool and is seldom afforded by those outside of serious enterprise grade businesses – which makes is developer base quite small compared to C#, SQL, ASP.Net, etc. Never the less, there is a rich and vibrant community of users who post and share tremendous amounts of technical insight and know how. I have become truly active in my local developer community in the pas couple of years and I see now why Microsoft pours so much effort into these folks. As a direct result, I typically can solve most of my technical glitches or unknowns with a minimal amount of time on Google or Live Search.
I am not saying Microsoft is perfect. I have my issues when I call in for Technical Support and deal with some of the first line folks. I here the same frustrations form my Cognos counterparts. The nice thing is that there is such a wealth of Microsoft product knowledge living both outside and inside Microsoft, that it’s one of those intangibles that is rarely given due weight in a product study. It certainly keeps the number of calls I’ve made to Microsoft to a minimum. As for which is the best product… another time and another blog post.
Comment: If anyone ever wants to experience the Microsoft community in full force – go to a local Code Camp. I’ve never gotten so many professional contacts in one place. And if there aren’t any near you, call you Microsoft Developer Evangelist and ask nicely for some help. You’d really be amazed.
I've previously blogged about a big BizTalk issue we've been having a work. After many months of traversing the Microsoft Support organization, the proper resources have been brought to bear on the problem. We have hope of a solutionon the horizon. Here's a recap with the answers.
Evolution of the Problem
Over a year ago, we stood up a new BizTalk 2006 server infrastructure to be used as the central intergation / orchestration point for all enterprise applications. The new environment is very nice with lots of memory, 64-bit processors, load balanced servers, etc. A nice change from the operational BizTalk 2004 environment which was single server and backed by a questionable SQL Server cluster. So with our new BT 2006 environment, we started migrating application off of BT2004. We also started creating new applications and expanding operaitonal ones. Life was good.
We use a single file share server for the majority of our file drop locations. This file share server is a Windows 2000 server and is not shceduled to be replaced soon enough by a Windows 2003 R2 server. About half way through our migration, we noticed receive locations on the BizTalk 2006 Server were shutting down every couple of days. This then became every day and then every hour and then every couple of minutes. The problem went from the simple matter of gettong the occasional MOM alert and manually restarting the port to being unmanagable. Life was not good.
Our quick fix was too implement a relatively simple C# console application to scan the receive locations every x minutes and restart stopped receive locations. This was accomplished using the BizTalk WMI interface and the Windows task scheduler. Things were'n't perfect, but we were operational. Then the fun started. Everything would grind to a halt every couple of weeks, then every couple of days. The symptoms were:
- BizTalk Management Interface was unresponsive
- SQL Server showed a blocking SPID to the SSO DB that would never clear (the system was blocked for 24 hours once before we implemented better alerting)
- Messages stop processing through BizTalk
The clearing procedure became cycling the Enterprise SSO service on the primary SSO server - which required all BT Hosts to be stopped. When the database is blocked, this becomes an hour task. Once the ESSO service was recycled, everything was well again.
First Try at a Fix
We were never happy with the ports shutting down and spent alot of time blaming the less than desirable Windows 2000 server hosting the file shares. We did alot of Googling and came up with a potential to fix the port shutdown problem: http://support.microsoft.com/default.aspx?scid=kb;en-us;810886. I had previosuly discussed this here. We increased the registry entries from 50 to 200. The problem didn't resolve. So we walked away from this KB article and resolved to wait for the file share server to be upgraded. We still had to deal with the blocking.
Enter Microsoft Tech Support
Microsoft Technical Support was contacted. We did a health check on the systems and identified many BizTalk housekeeping issues we had. These items were rectified but the problems persisted. In the process of exploring the SQL Tables, we noticed the BizTalk work queues filling up with orphaned instances. We had no idea where they were coming from and they were not showing up in the Group Hub. There were actually thousands of them spread accros the various host instances in our implementation. We worked with MS support to do some cleanup of our produciton environment. This seemed to help but then the queues kept filling up again. After some focuse digging, we managed to stear ourselves and MS support to the solution. A new hotfix just off the presses - KB936536 (still internal as of this post). We tried and it did not work. Back to the drawing board.
After some more in-depth digging, it turned out that there was a bug in BizTalk that involved receive locations stopping unexpectedly not being cleaned up properly. This bug left the orphaned instance in the work queues. A patch was created after several weeks and fixed the queue problem. It did not fix the blocking problem.
After some escalation and shuffling around, we were given a new set of support professionals and had the attention of Escalation Engineering, Product Team, SQL Sengineers and finally - DTC Engineers. Many weeks of logging and capturing increasingly deeper levels of data led to get the DTC folks involved. It was one of those joyous moments where you send away your tons of log files and you get one of those moments in an email where you here - 'oh yeah, we've seen this before... there's a hotfix'. There is such joy that there is a solution and a sense of frustration that nobody said this sooner.
The Root of it All
- File Share Server not tweaked to handle the load.
- BizTalk shutting down receive locations.
- Custom BizTalk WMI program keeps restarting ports.
- High WMI/DTC activity brings about KB 934849: A COM+ application that is running on a Windows Server 2003-based computer stops responding and some work items that are queued in the MTA thread pool are not completed.
First Step
Stop the ports from shutting down. Apparently KB 810886 was the solution to stopping the port shutdowns. We needed to increase the registry entries on both the client and server to 2048 to see a difference. Once thoe port shutdowns stopped, the WMI based port watcher has starting ports less frequently which reduced the load on WMI/DTC. When thoe ports stopped dropping, we set the port watcher to run every 30 minutes. We've not had a problem since. We are now testing KB 934849 on our staging servers. It will be deployed next week to production if all goes well.
SQL Adapter Issue Also?
When we upgraded some SQL Server Adapte rintensive projects form BT 2004 to BT2006, we experienced a similar level of blocking on the SQL transaction associated with these projects. It only affected the transactional system databse and not the BizTalk database. We were never able to fix it despite all of the hints and other tweaks we pulled out of our bag of tricks. Apparently the SQL Adapter in BT 2006 has an increased default isolation level. We followed all of the new guidelines and still had no success. We are hoping that this SQL Adapter blocking was the result of KB 93489 as well. We will be testing this shortly and I will blog it as well.
We've been working on an ongoing issue with BizTalk 2006 at the office. We've had a two fold problem with SQL Database blocking and our Messga Queues filling up with control messages. After all of the go-arounds with Microsoft Professional Support Services (PSS), we collected enough data and they were able to reproduce the run-away queues. They took all of this and produce a hot fix within 3 weeks. After testing and installationin production, the blocking returned in the database. But on the good side, the message queues did not fill up again. So problem partially solved but the fun goes on. The hot fix solved the problem of what happens when a cache service calls InstanceComplete. If there is more than one message for the old instance, InstanceComplete gets called more than once causing the second call to fail because the instance had already been terminated by the previous call. The batch operation is blown and the old caching service instance never gets properly cleaned up. This old instance still gets control messages from the BizTalk core and since it is not actually running... they accumulate. They accumulate alot!
So the accumulation no longer takes place... we are still concerned about the blocking. Our port watcher application that enumerates pesky receive ports that shut down frequently. When the port watcher calls into the WMI object model to enumerate the receive locations, ISSOConfigStore.GetConfigInfo() gets called. This then calls ssox_spLookupXp whic reads from the SSO database and is the source of the blocking. We are running more traces to gather information for PSS. Stay tuned.
Space Coast .Net User Group
Sorry I missed last night's presentation, I am very curious about the Amazon Web Services.
BizTalk Bug
Still waiting on Microsoft to see if this blocking bug can be fixed with their patch. It's taking longer than they expected for the hot fix.
New Data Warehouse project
Got to give have a technical proposal review with Harris' CIO this morning. Went pretty well. Had some excellent feedback and am looking forward to diving into my new HR Data Warehouse project!
SSIS Training in NY
Going to New York to get SSIS training on ETL for dimensional models. The class is called Data Warehousing with SSIS (SQL Server 2005 Integration Services) and is offered by Symphic. It promises to be a good class and I get to visit my brother in NYC at the same time. I've brought my daughters Pablo stuffed animal with me and he'll be posing for pictures while Daddy's away on his trip. Stay tuned....
We've been having a heck of a time lately with the BizTalk working queues filling up with ghost messages. The level of blocking in the Database had become unbearable at certain times. We had no idea where it was coming from as our activity levels have remained relatively constant (we hadn't found out about the queues yet). We worked with MS support to do some cleanup of our produciton environment. This seemed to help but then the queues kept filling up again. After some focuse digging, we managed to stear ourselves and MS support to the solution. A new hotfix just off the presses - KB936536. We've applied it today and will see if it works. I will post a more complete write up in a few days.
We've had a problem with our BizTalk Servers for quite a while. Receive ports to a particular file server would seem to shutdown almost at random. After much searching of the internet, there was a consenus to build a port watcher application using BizTalk WMI. We did this and it worked, ports would stop and the port watcher would kick them in the butt to get them going again. But this was just a band-aid to a huge sore under the surface. After almost a year of this, a re-researching of the problem finally gave way to a helpful post and then a solution!
It turns out that when a BizTalk Server is checking LOTS of directories on the same server, the network BIOS command limit can be reached:
"This issue may occur if the client computer submits simultaneous, long-term requests against a file server that uses the Server Message Block (SMB) protocol. An example of a long-term request is when a client computer uses the FindFirstChangeNotification function to monitor a server share for changes."
"This issue may occur if the MaxCmds registry value setting on the client is less than 50, or the MaxMpxCt registry value setting on the server is less than 50."
So, per the kb article, an increase of the MaxCmds regsitry entry to 500 solves the problem (after a reboot)! Not a BizTalk problem at all. The limitation in the Windows 2003 OS designed to throttle its workload (for good reason) was causing the receive location to shutdown. The error message wasn't very helpful from BizTalk as it really didn't provide a good error code or reason to point to the command limit that was eventually found to be the problem. Now if I can only figure out why my File Recieve loactions to DFS shares do the same thing!
There's a point in time where on one side, chaos reigns and the fundamental laws of order seem to be suspended. Then on the other side is the light, the complete understanding of something that now seems simple. This is a story of a boy, an HTTP Send Port and his trust in Microsoft's 'calculations.' It all started with a MOM alert:
Severity: Error Status: New Source: BizTalk Server 2006http://localhost/xxxx/sendPoWS/sendPoWS.asmx/upDatePOData Name: Error: An outbound message is being suspended by the adapter. Description: A message sent to adapter "HTTP" on send port "scosPoOut" with URI "http://localhost/xxxx/sendPoWS/sendPoWS.asmx/upDatePOData" is suspended. Error details: The HTTP send adapter cannot complete the transmission within the specified time. Destination: http://localhost/xxxx/sendPoWS/sendPoWS.asmx/upDatePOData MessageId: {DA614303-8AA6-444E-8312-4C41B4BBD29C} InstanceID: {954FAA15-BAA0-453A-9AD5-C62E186AF05E}
Cannot Complete the Transmission? It's a request-response HTTP port off of an orchestration... the webservice is on the local host... why can't it complete sending the XML message to the service? Let's look at this IIS Logs...
#Fields: date time s-sitename s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status 2007-06-10 10:59:18 W3SVC1 127.0.0.1 POST /xxxx/services_1/sendDemandWS.asmx/upDateData - 80 - 127.0.0.1 Microsoft+(R)+BizTalk+(R)+Server+2006+3.0.1.0 200 0 64
It has a sc-status of 200? 200 - OK? Checked the web service logs and it successfully submitted the data. But wait... what is this sc-win32-status of 64...type net helpmsg 64 at cmd prompt gets "The specified network name is no longer available". This is localhost, why is it having problems talking to itself? We've had server build issues in the past but how could the TCPIP stack be hosed? That's impossible. OK. Let's back up and go to BizTalk.
BizTalk is sending the message to the web service via the orchestrations HTTP send port. It retries 5 times per the configuration and then fails. Each time it tries, the message is received and processed by the web service. BizTalk is just not getting back that final ACK. But it's local Host...
After some Google'ing on sc-win32-status and 64, the collective wisdom said it's an indication that one side of the TCP connection is closing prematurley. So am I throwing errors in this web service somebody else wrote that I have to maintain? No... there's a 200 returned by IIS - that's success. So where's the other end of the connection... BizTalk!
The HTTP send port's Request Timeout is set to '0' which is default. When set to zero, the time-out is calculated based on the request message size. Should be fine. But it isn't. How do they calculate the size? I found one posting with the answer:
Timeout = Min((180sec + ((MessageSize* 3)/1000)), 3600sec)
So, Ihave a message that is 6.4 Megs in size. I do the calculation and it's going to top out at 3600 secs. I turn on additional logging in IIS and see that the actual IIS run takes 5900 seconds (I know this is a poor design and is way too long for a TCP connection to hang open - v2 will change the architecture). So back to the beginning. BizTalk is giving up after an hour and retrying. This why I see the retries going at intervals of 1 hour and 5 minutes. 1 Hour to timeout and 5 minutes for the retry interval. Overriding the send port's default value of '0' to '8000' solves the problem.
It's really crazy that the solution was one of the first items that I dismissed. And I hold no malice toward Microsoft's calculation as it is reasonable and this is not a reasonable circumstance. I was really disapointed that the calculation method was not there in bold print. Instead I meandered down many dead ends before I regrouped and went deep. I am a wiser mand for having made the journey.
Quick tip for those putting alternate user names and passwords. When you export your bindings file, BizTalk 2006 replaces the password for the user credentials with a VT_NULL inside the <TransportTypeData> element:
<Password vt="1" />
Be careful, that vt="1" mean NULL, or in other words - no password. So when you do this:
<Password vt="1">mysecret</Password>
You might as well do nothing. When the file receive adapter imports these bindings, it will not set a password. However, the send adapter doe snot seem to care and will set the password fine. Regardless, the proper form is to specify VT_BSTR attribute in the Password element:
<Password vt="8">mysecret</Password>
You can find this documented in the Biztalk binding file documentation:
Configuration Property Variable Types
File Adapter Configuration Properties
A few months ago I came upon the need to validate some XML in an orchestration against one of the schemas in the project. So, I turned to the wisdom of the net and came accross the desired information in the Arch hacker's BizTalk Blog:
http://thearchhacker.blogspot.com/2004/09/cool-xsd-validation-function-for.html
The code he wrote was functional and elegant. I especailly like the way he referenced the schema in its assembly using the schema strong name. Nice. But alas, I was upgrading this project to BizTalk 2006 (and this .Net Framework 2.0) and was notified that:
warning CS0618: 'System.Xml.XmlValidatingReader' is obsolete: 'Use XmlReader created by XmlReader.Create() method using appropriate XmlReaderSettings instead. http://go.microsoft.com/fwlink/?linkid=14202
So in an attempt to rid myself of the annoying compiler warnings, I set out in search of an updated version to no avail. It was time to crack the Framework docs and do it myself. So here it is:
public static bool ValidateDocument( XmlDocument businessDocument, string schemaStrongName, ref string xmlValidationException ) { // Constants const int PARTS_IN_SCHEMA_STRONG_NAME = 2; const int PART_CLASS_NAME = 0; const int PART_QUALIFIED_ASSEMBLY_NAME = 1;
// Parse schema strong name string[] assemblyNameParts = schemaStrongName.Split( new char[] { ',' }, PARTS_IN_SCHEMA_STRONG_NAME ); string className = assemblyNameParts[PART_CLASS_NAME].Trim(); string fullyQualifiedAssemblyName = assemblyNameParts[PART_QUALIFIED_ASSEMBLY_NAME].Trim();
// Load assembly Assembly schemaAssembly = Assembly.Load( fullyQualifiedAssemblyName );
// Create instance of the BTS schema in order to get to the actual schemas Type schemaType = schemaAssembly.GetType( className ); Microsoft.XLANGs.BaseTypes.SchemaBase btsSchemaCollection = ( Microsoft.XLANGs.BaseTypes.SchemaBase ) Activator.CreateInstance( schemaType );
// Set up XML reader and validate document XmlReaderSettings ReaderSettings = new XmlReaderSettings(); ReaderSettings.ValidationType = ValidationType.Schema; ReaderSettings.Schemas.Add(btsSchemaCollection.Schema); XmlReader reader = XmlReader.Create(new StringReader(businessDocument.OuterXml), ReaderSettings); try { while( reader.Read() ) {} } catch(XmlSchemaException xse) { xmlValidationException = string.Format("XML Validation Error: {0}", xse.Message); return false; } catch(Exception ex) { xmlValidationException = string.Format("Unexpected Error: {0}", ex.Message); return false; }
// success xmlValidationException = String.Empty; return true;
}
And it is called from an expression shape as was previously documented in the Arch hacker's blog:
ValidateDocument( myBtsMessage, myBtsMessage( BTS.SchemaStrongName ) );
|