By Scott M. Fulton, III, Betanews
What we've been calling a "perception problem" with Windows Vista -- the notion that users may tend to think it's less secure or reliable than it has proven to be on a large scale -- isn't just about perception for users faced with severe unreliability issues. As a Windows user for over two decades, I have been to the far depths of unreliability, and have lived to tell the tale. Probing the problems with Windows is actually part of my job, and one reason I actually am a Windows user -- unlike the rest of the world.
Yesterday, a problem that's far beyond perception afflicted a 64-bit Vista SP2-based Betanews production system for the fourth time in a year, this time with the remedy being so far out and unusual that everyday users could not possibly have discovered it by normal means. As we've found out, it's a problem that has affected a small number of Vista users since the system's debut three years ago, though that number appears to be growing steadily just as Vista is preparing to vacate the spotlight for Windows 7.
It's being jocularly called the "Black Screen of Death" (KSoD), although unlike its bluer predecessor, it's not about a Windows driver freezing up or an exception or stack overflow locking up the kernel. In fact, the most puzzling aspect of this dilemma has been that Vista is actually running and the logs show all the drivers have loaded and are in working order. You just can't use it -- the screen stays black, except for a bright mouse pointer that you can move around for no reason except to prove your machine hasn't locked up. The usual system keystrokes -- Ctrl-Alt-Del, Ctrl-Esc, Alt-Tab, Ctrl-Shift-Esc -- all appear non-functional.
The first three times we encountered this dilemma, we were able to restore our Vista system through System Restore rollback, which did bring back our system and which then enabled us to uninstall and reinstall updates. We then dismissed the incidents as more peculiar and perhaps non-unusual behavior from Windows. The fourth time was more serious, although we did recover and, just in case you experience this yourself, we'll tell you how. Our research during the incident indicated that gradually more Vista users have faced the KSoD in recent months, though the experiences they have shared are so diverse that a precise pathology of the problem has yet to emerge...and Microsoft hasn't been much help in that department.
This amateur YouTube video shows the actual Black Screen of Death behavior on an Alienware notebook computer.
Here is what my experience with Windows is telling me: The KSoD appears to me to be a behavior that's, to borrow a Microsoft phrase, "by design." In other words, the behavior that allows you to see the Vista log in screen, see your users' faces all in a row, and log in under any user name and password just as you would normally...before encountering the inky void, feels like a program that's behaving the way it was designed to. Presently, my belief is that this behavior may be triggered by any number of different events, and that the nature of the events themselves is not directly connected to the behavior. That would explain the different situations users faced leading up to the problem, as well as the fact that what appears to be the solution for some has not been a solution for all.
Back in 2007, Microsoft representatives sent e-mails to their customers warning of the onset of something called "Reduced Functionality Mode," which reportedly enabled Vista to disengage the Start Menu, taskbar, and other features when Microsoft software was determined not to be genuine. That was when the whole "Black Screen" metaphor was first coined. For reasons that we thought at the time had to do with Microsoft listening to its customers, it replaced RFM in Service Pack 1 with a less detrimental piece of built-in nagware. (Our affected Vista machine uses SP2.)
My experience is telling me that the Black Screen is a kind of "curtain" which may have been intended as an intentionally aggravating shutdown for suspected software pirates. Think of it as deprecated code that may be hanging around the operating system, like unused DNA for a more evolved species. I have no independent confirmation of this theory; however, it's the only rational explanation I can apply to the fact that Windows truly is running and operational during the whole time, as I was able to confirm. Now, did I deserve this curtain? In other words, am I running non-genuine Microsoft software? No. All of our Microsoft software on production systems here is legitimate -- either purchased individually or distributed to us by Microsoft itself through our MSDN subscription.
The first three times I encountered the KSoD, for the first several tries, I could not boot Vista even to Safe Mode with Command Prompt -- in that instance, the system would appear to freeze at the "Please wait" screen. Running the WinPE recovery environment (available either from the Vista install disk or from the extended boot menu, available when you press F8 on startup) enabled me to roll back the system to a pre-update state, and that appeared to resolve the issue. For the record, two of those incidents began with Vista failing to recover from hibernations, the third seemed to be random and triggered by nothing obvious.
This fourth incident was, to use my grandmother's phrase for recipes that never cooked to her standards of perfection, "a doozy." Not even the earliest system restore point resolved the matter, and I often had to run WinPE from the Vista install disc instead of the F8 method (although that method did eventually work twice). The incident began with an attempt to load a simple Excel spreadsheet. Excel locked up, then I tried using Task Manager to remove Excel. Nothing happened, so I tried a reboot from Task Manager. And that didn't happen either, so I tried to close all my programs first. They locked up and turned that ghostly shade of white that Vista gives them when that happens. So in desperation, I powered down.
Cue the Rod Serling narration.
Next: How I escaped from Vista's darkest tunnel...
Naturally, we have more than one machine here (the actual number of computers in this office is typically a fractional value), so it was through a working system that I was able to research other cases of the Black Screen of Death online.
Last December, the CEO of an independent IT services provider in Charlotte, N.C. discovered one cause for the KSoD that impacted multiple customer systems. His customers use a managed system support driver called Zenith SAAZ. For reasons he was unable to determine, the use of that driver negatively impacted a System Registry entry. Customers who were able to boot their systems in the WinPE (or WinRE, depending on which end of Microsoft talks about it) recovery environment were able to start RegEdit, replace the offending entry with a correct value, restart their systems, and eliminate the KSoD.
That wasn't our situation -- not being a customer of his, we didn't have the offending Registry entry.
Last month, a Belgian security engineer and Microsoft MVP (non-employee partner) named Mark Gregoire encountered the KSoD, and discovered his solution dealt with disabling event logging. But Gregoire could accomplish this through Safe Mode with Command Prompt; I couldn't even get that far.
Disabling event logging requires me to use a program called MSCONFIG, which Windows veterans have been familiar with since Windows 95. With it, you can set your system for a "diagnostic boot" which disables all or some system services. From there, you can re-engage those services that you no longer suspect to be the culprit.
But in Vista, MSCONFIG requires not only administrator privileges, but to be run from a logged in member of the Administrators group. For that (stupid) reason, you can't run MSCONFIG from the WinPE environment -- that's right, you're restricted from using the one tool you need to recover your computer, since its own security is incompatible with a recovery environment that can't log you onto your Administrators group because your computer needs recovery. (No, Mark Minasi, the RUNAS command won't work here either.)
So I needed to find some way to get into my system with the privileges I needed to run MSCONFIG. Here, Gregoire's second paragraph provided the amazing clue that fetched my system out of its misery: "When you are presented with a KSoD, you can try to press the left Shift button a few times to trigger the sticky keys feature of Windows. This will pop up a window that contains a link. You can then click this link and from there you are able to launch different applications. Of course, if you disabled sticky keys, you are out of luck."
As I mentioned earlier, Windows actually runs while all this blackness is going on. Though none of the usual system keystrokes are functional, the routine deep in the keyboard handler that enables one-poke-at-a-time multiple keystrokes is still operating. When you're logging on, this is evidenced by the fact that you can poke the left Shift key five times in rapid succession and get the initial "Sticky Keys" dialog -- the first indication of life in the darkness.
After you've logged on, you're officially in the Aero environment, whose "Sticky Keys" dialog is different from that of the 2D environment, for undocumented but strangely useful reasons. The Aero version contains a single hyperlink, which reads, "Go to the Ease of Access Center to disable the keyboard layout." That single hyperlink is our ticket home, because Ease of Access Center is part of Control Panel, and Control Panel is part of Windows Explorer.
What's more, the fact that you can see a piece of your desktop wallpaper through the transparency of the Sticky Keys dialog's title bar (though this screenshot was taken later) is a clear indicator that Windows is running perfectly, and the Black Screen is just a very effective façade.
Once you get the Ease of Access Center, you can back out using the Location bar to get a directory of your main system drive. There you'll be reassured to know that everything is fine -- your file system is intact, none of your documents are lost. What you need to do now is locate MSCONFIG from your system directory (usually \Windows\System32), and run that to get to your diagnostic setup.
Believe it or not, you're in a chase at this point, just like the climactic scene of a really bad thriller movie. For some reason, some process will periodically exit any program you've started, returning you to the KSoD...and not even the Sticky Keys trick will work until you reboot. My tests show that period of time to be pretty close to five minutes, which is one more indicator to me that this behavior truly is by design -- that this is the leftover remains of a trap intended to befuddle software pirates. Because what system process creates forced exits of running applications every five minutes, almost on the dot, unless it's programmed to do so?
With your stopwatch started, you have to run MSCONFIG, and from the General tab, set the option for Selective startup. Then go to the Services tab, click the Disable all button, and click on Apply. Then click OK (for some strange reason, click on Apply first). If there's still time remaining, you'll be given a dialog asking you to restart. Click on Restart. Your system might power down and it might not (in our case, it didn't), in which case you may need to power down manually.
But when you restart, you'll be able to fire up an account and get to your desktop at least. Your system will look like Safe Mode for paranoids, or like "Windows 95 Minus." Don't fret, because all is actually quite well. From the Start Menu, start up MSCONFIG again, go to the Services tab, and click on Enable all. Then go through the services list. Disable any third-party service you're not familiar with. You'll see anti-virus services here, and you can leave those enabled.
By all means, however, be sure to disable Windows Event Log. This is the bugger in our situation, and the probable culprit in a great many (though apparently not all) KSoD incidents. After rebooting our Vista system, we were returned to a normal world.
At this point, as computer engineer Mohannad Shaheen suggests in a comment on an IT consultant's blog, the event logs themselves could be corrupt in this instance. Shaheen advised, while the event logging service is disabled, renaming the %systemroot%\System32\winevt\Logs directory to Logs_bad, creating a new Logs directory, and re-enabling the service. Others advised that the event logging service is only useful for certain administrators anyway.
When an operating system does suffer from a perception problem, as Vista clearly does, problems such as the Black Screen of Death only serve to confirm users' worst suspicions. Obviously Vista is flawed, the argument goes, because you can see the evidence for yourself -- and certainly since yesterday, it's more difficult for me to argue otherwise. But if the root cause of this behavior is, as I now strongly suspect, disavowed Microsoft code intended at one time as a warning sign for pirates, then from a technical standpoint, the solution is probably extremely simple: a patch that removes the code. The issuance of such a patch may cause short-term grief for Microsoft, but it might be better for the company to face up to this mistake and undo it. The alternative may be to face a rising tide of disaffected Windows users whose amplified warnings to other legitimate users could be given far greater validity than any black curtain Microsoft might deploy.
Scott sincerely thanks the hard work of the IT professionals linked to in this article, for helping him and the rest of the world obtain a reasonable solution.
Copyright Betanews, Inc. 2009