It was the crisis that almost ended my career -- my real career, specifically, as an elite software architect-for-hire to some of the largest companies in the world.
It was April, 2009, and my family and I had just set sail on a 26-day Costa repositioning cruise from our home on Mauritius. Destination: Savona, Italy, by way of the African coast, Suez Canal and Egypt. It was day three and we were docked at the port of "Noise Be" on the coast of Madagascar when a quick check of my email at a local Internet café revealed a storm brewing back at 750 7th Avenue, NYC.
Those familiar with the midtown financial district will note that "750 seventh" is the home of Morgan Stanley, one of the largest financial services companies in the world and (at the time) my largest client. I had spent the past nine years developing and deploying a sophisticated, n-tier performance management framework across the bulk of their trading infrastructure. I wrote every line of code, from the T-SQL statements on the back-end SQL Servers to the ASP.NET middleware to the Windows client services pulling and aggregating performance metrics from PDH.DLL. It was my baby, and baby was sick.
Specifically, the client agent was gobbling up kernel handles and never letting them go, threatening to blue-screen thousands of critical trading workstations. The Morgan IT folks spotted the problem (ironically, it was my own monitoring software that alerted them to the leak) and wisely shut down the service until they could contact me.
But there was a problem. I was 11,000 miles away floating around in the Indian Ocean. In a classic case of Murphy’s Law come true, something that could go wrong did go wrong, and right in the middle of my vacation. It was the kind of situation that could have torpedoed my relationship with Morgan. I was a one-man consulting shop, which meant I had nobody on land to back me up.
What I did have was a shiny new Dell Precision M60 Portable Workstation with 12GB of RAM and a collection of VMware Virtual Machines with my entire solutions stack. Also along for the ride was the installation media for nearly every piece of Microsoft software that was even vaguely related to my solution. I may have been semi-incommunicado, but I was prepared to fix almost anything should the need arise.
So, after a garbled Skype call to my primary Morgan contact, I went back to my cabin on the ship and set to work tracking down the fault. Since this was a new behavior, and since my client software had been running literally for years at Morgan without incident, I immediately focused on external changes. My contact noted that the only significant modification they had made on the client side was upgrading everyone from Internet Explorer 6.0 to Internet Explorer 7.0 (remember, this is back in 2009). So I knew something about that change had broken my agent. But what?
I fired up VMware and recreated their updated client stack. Sure enough, my agent was gobbling handles like they were going out of style. Confident that my code was not the real source, I started looking at libraries that both IE and my agent shared. The culprit turned out to be winsock.dll, the library used by virtually all Windows applications and services to communicate over the Internet.
My client agent was designed to communicate over HTTP/HTTPS, using a custom command language and a combination of GET and POST commands to both remotely manage the agent services and also transfer data up through the ASP.NET middleware to the SQL Server back end. And every time it did so, it used winsock.dll to establish the connection and manage the various client/server interactions.
It was an elegant solution, one that had the added benefit of being compatible with virtually any network topology or firewall configuration. In fact, it was this inherent communications flexibility that later allowed me to deploy this same platform globally as a free monitoring service for Windows PCs and servers (i.e. "Windows Sentinel").
But now winsock.dll was causing me grief. It seemed that, when Microsoft released IE 7.0, they also updated winsock.dll, and this new version was causing my agent to leak handles. Specifically, when my agent would open a connection to the upstream middleware server, the subsequent close command failed to prompt winsock.dll to release the handle back to the kernel pool.
It was a serious bug, one that Microsoft had introduced as part of the IE 7.0 upgrade package, and one that I now had to work around. Thankfully, the fix was easy. Instead of opening the connection and initiating a request, I would first close the connection and then open it a second time. Doing so stopped the handle leak, and for the purposes of my fix that’s all I needed to know (I never did get a satisfactory answer from Microsoft as to why it occurred in the first place, but I digress).
Armed with my "solution", I fired up Visual Studio and quickly rolled a new version of the agent, this time with the winsock.dll "double-tap" (as I called it internally) in place. By the time we docked in Mombasa, Kenya, I was ready with a newly minted MSI containing my fixed version and instructions on how to patch the affected systems. And 48 hours later, as I was wandering the streets of the Port of Aden in Yemen (yeah, I’m suicidal that way), I got word that Morgan had deployed the new version and all was well.
Needless to say, it was an educational experience for me. Not only did it teach me the value of having the right tools and resources within easy reach, it also prompted me to update my own regression testing to include not only legacy systems that my clients might use, but also every possible upgrade path they might pursue from that point forward (hint: communication is key here). Because, as my experience showed, you can never predict what your client might do while you’re away.
Bottom Line: When you’re a one-person shop who’s managed to win the trust of one of the largest companies in the world, you never really go on "vacation".
Photo Credit: a_v_d/Shutterstock