Wednesday, June 29, 2011

Win 2008 Terminal Server Network App Crashing

How many Microsoft Developers does it take to change a lightbulb? None. They just make darkness the new standard.

You can no longer safely run applications from a network location in Windows 2008 & R2 terminal server. When a user opens a program across the network and logs off, it will crash the program for all other users running it on the same server.  It appears that Microsoft is no longer supporting functionality that has been present for 20+ years and is not going to fix the bug due to "Architectural Changes" in Windows 2008.

Here is a summary of the issue:
  1. When a user opens any network application in a 2008 or R2 Terminal Server environment, the OS creates a File Control Block (FCB). The FCB is a handle that the OS uses to access the program file loaded into memory.

  2. If another user on the terminal server opens the same program, the OS will give the second user access to the first user’s FCB and access to that part of the original user’s memory space that stores the executable.

  3. When the first user logs off, all their FCB’s are dropped and become inaccessible to other users that were sharing them.

  4. The next action the remaining users perform in the program fails and it crashes the application because it cannot access the program files.

You can see this reported in the Application event log on two entries with the same time stamp:
  • Application Error, Event ID 1000 – “Faulting application xxxxx.exe, version xxxxx…”

  • Application Error, Event ID 1005 – “Windows cannot access the file for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program XXXXX because of this error.”
Microsoft says the "New Standard" is to load applications locally or use a WebDAV share. Local install works ok in one or two-server environments, not so well in 400+ server environments. Applying program updates to multiple servers is an administrative nightmare and a waste of resources as your environment grows. And WebDAV? Really?

We've gotten past Microsoft's Offshore Support Defense Forces and we're finally in discussions with high-level US engineers at Microsoft. We're attempting to convince them of the gravity of the issue and to resolve it. The engineers have acknowledged that they have received multiple reports of the issue but that MS development is refusing to fix it.

They said that the bug is coded deep in the 2008 OS and would require architectural redesign, so they are reluctant to make any changes. It looks like they tried to fix it in 2008 R2, because it now shares the FCB of the last user who opened the program file instead of the first, but it still crashes when they log off before another user grabs the ball.

It seems like they forked in the wrong direction and locked themselves in a faulty design.

Update - 2016.11.03 - The Microsoft KB article 2536487 states that Windows Server 2016 fixes this issue.  If you have installed Windows 2016 and verified that this is fixed, I would love to hear from you in the comments.  Thanks!

Update - 2012.05.01 - Multiple responders have reported that accessing the network files over a DFS share eliminates the FCB errors.  See comments below.

Update - 2011.09.23 - We have had some success running the programs from a UNC path instead of using the mapped network drive.  This is still in testing, but I believe you may also need to remove the mapped drive to eliminate the application crashes. If this works in your environment, please post back in the comments.