Bear Naked Code: Win 2008 Terminal Server Network App Crashing

Wednesday, June 29, 2011

Win 2008 Terminal Server Network App Crashing

How many Microsoft Developers does it take to change a lightbulb? None. They just make darkness the new standard.

You can no longer safely run applications from a network location in Windows 2008 & R2 terminal server. When a user opens a program across the network and logs off, it will crash the program for all other users running it on the same server. It appears that Microsoft is no longer supporting functionality that has been present for 20+ years and is not going to fix the bug due to "Architectural Changes" in Windows 2008.

http://support.microsoft.com/kb/2536487

Here is a summary of the issue:

When a user opens any network application in a 2008 or R2 Terminal Server environment, the OS creates a File Control Block (FCB). The FCB is a handle that the OS uses to access the program file loaded into memory.
If another user on the terminal server opens the same program, the OS will give the second user access to the first user’s FCB and access to that part of the original user’s memory space that stores the executable.
When the first user logs off, all their FCB’s are dropped and become inaccessible to other users that were sharing them.
The next action the remaining users perform in the program fails and it crashes the application because it cannot access the program files.

You can see this reported in the Application event log on two entries with the same time stamp:

Application Error, Event ID 1000 – “Faulting application xxxxx.exe, version xxxxx…”
Application Error, Event ID 1005 – “Windows cannot access the file for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program XXXXX because of this error.”

Microsoft says the "New Standard" is to load applications locally or use a WebDAV share. Local install works ok in one or two-server environments, not so well in 400+ server environments. Applying program updates to multiple servers is an administrative nightmare and a waste of resources as your environment grows. And WebDAV? Really?

We've gotten past Microsoft's Offshore Support Defense Forces and we're finally in discussions with high-level US engineers at Microsoft. We're attempting to convince them of the gravity of the issue and to resolve it. The engineers have acknowledged that they have received multiple reports of the issue but that MS development is refusing to fix it.

They said that the bug is coded deep in the 2008 OS and would require architectural redesign, so they are reluctant to make any changes. It looks like they tried to fix it in 2008 R2, because it now shares the FCB of the last user who opened the program file instead of the first, but it still crashes when they log off before another user grabs the ball.

It seems like they forked in the wrong direction and locked themselves in a faulty design.

Update - 2016.11.03 - The Microsoft KB article 2536487 states that Windows Server 2016 fixes this issue. If you have installed Windows 2016 and verified that this is fixed, I would love to hear from you in the comments. Thanks!

Update - 2012.05.01 - Multiple responders have reported that accessing the network files over a DFS share eliminates the FCB errors. See comments below.

Update - 2011.09.23 - We have had some success running the programs from a UNC path instead of using the mapped network drive. This is still in testing, but I believe you may also need to remove the mapped drive to eliminate the application crashes. If this works in your environment, please post back in the comments.

58 comments:

Anonymous said...: I'm with you 100%. I'm experiencing this issue supporting customers running my application from a Network Share and I've already notified them to not use Windows 2008 and 2008 R2.; September 23, 2011 at 9:35 AM
Adrian Hayes said...: We've seen some mixed success with running the applications via UNC path instead of a mapped drive. I'm not sure if Windows shares FCB's when running via UNC.

I haven't nailed it down yet, but I believe it may succeed if you no longer have a mapped drive on the same network share.; September 23, 2011 at 10:44 PM
Anonymous said...: We are experiencing this exact issue on a Windows 2008 TS/Xenapp server running applications from a network share on a Windows 2003 R1 server.

We will try changing the shortcut to a UNC path to see if that resolves the issue, but we dont have the option to remove the mapped drive as it will impact too many other applications.; September 26, 2011 at 6:18 PM
Adrian Hayes said...: It's pretty easy to definitively test to see if it's resolved:

1) Log in as User 1 and launch the program from the UNC path. Use it for a little while - enough to load several binaries into memory

2) Log in as User 2 and launch the same program from the UNC path. Again, run a while.

3) If on 2008 R2, log off User 2. If on Windows 2008, non-R2, log off User 1.

4) Attempt to use the program on the remaining user session.

I find that it often takes a while to duplicate the application crash. It doesn't occur until the program needs to read from a binary (.exe, .dll, etc.) that it hasn't yet loaded into memory on the current session but had been accessed on the logged off session.; September 27, 2011 at 12:17 PM
Anonymous said...: Hi. We have just implemented a Citrix farm and this is one of the problems we experienced.

Can you post again if you get anywhere with Microsoft.; November 8, 2011 at 6:37 AM
Adrian Hayes said...: Phil,

I'm not holding my breath for a hotfix. Even if one were forthcoming, our Enterprise contact told us it would be at least Windows 9 (read: 5-7 years) before it was implemented. MS has pinned their performance improvements on the features that cause this issue.

We're mitigating the issue by implementing Citrix Provisioning Server to push software updates out to our servers. It's not pretty, but it will come with the side benefit of reducing the load on our filers.; November 8, 2011 at 10:01 AM
Anonymous said...: One workaround to this is to create multiple shares for the same physical path and assign each share to a single user. Ex. physical path F:\SHARED is being shared as \\FS\USER1 and \\FS\USER2 USER1 maps X: drive to \\FS\USER1 while USER2 maps X: drive to \\FS\USER2. Both USER1 and USER2 will access the same files and folders. The question here is what is the maximum limit of shares per folder. This has been tested and worked without problem for two users but the question is what will happen for 100 or more users.; December 9, 2011 at 6:18 AM
Mike S. said...: I have unsuccessfuly found a hotfix or Windows Update to fix this issue (seen sporadically). Does anyone else have an update on this?; January 17, 2012 at 2:20 PM
Anonymous said...: Hi,
if you must continue to use your apps through a network drive in Terminal Services on Windows 2008, you can use DFS as explained here: http://social.technet.microsoft.com/Forums/en-US/winserverTS/thread/d7bab29e-2bce-4785-9abb-259cff3a153e?prof=required

Carlo; January 26, 2012 at 6:55 AM
Adrian Hayes said...: Carlo - very interesting. I'll have to try that. Thanks.; January 26, 2012 at 8:57 AM
smicke said...: I'm using UNC-path at the moment and it works ok, but nog 100%. Has anyone heard something more about this issue?; February 15, 2012 at 7:52 AM
Anonymous said...: I believe that we are experiencing the same issue in our environment. I know that it has been asked, but what kind of results are we getting with the UNC path?; March 22, 2012 at 7:36 PM
Anonymous said...: Just some follow-up questions:

1. Is the issue caused by the closing of the network app or logging off of the RDP session?

2. Does it matter if the current FCB owner closes out the network app cleanly before logging off?; March 22, 2012 at 9:10 PM
Adrian Hayes said...: What happens if you add your server to the Local Intranet zone in Internet Options? I've had that resolve some performance issues.

AFAIK, the holder of the FCB maintains it even if they close the file. I would imagine it would get unloaded from memory only if there are no further references to it by other users. However, I have not tested this scenario.

The issue triggers when the user who is currently holding the FCB logs off.; March 23, 2012 at 1:52 PM
Paul Jeffries said...: I moved the application so it ran from C:\ rather than from a mapped drive and have seen a significant improvement.; March 28, 2012 at 12:13 PM
Scott said...: This issue has been causing us nothing but grief as well where I work...

A network application that has worked great for years now throws fits when set up on our new Windows 2008 (32-Bit)/XenApp Server environment. The application is quite old but a critical component of the business; so we had to accommodate it (including using 2008 rather than 2008 R2 unfortunately).

After a lot of fix attempts, numerous chats with the applications vendor, it was clear it was a known issue and their workaround was messy requiring us to copy exe's to a local folder and then use old DOS-based SUBST commands to point to it.
We then found this MS KB article:

http://support.microsoft.com/kb/2536487

WebDAV? Really? It was impractical and TERRIBLY slow (NOT WORKABLE). We opened an incident with Microsoft, we got a canned response that there was a HotFix. Only to discover there was NO hotfix. Only to discover they were internally TESTING a Hotfix.. etc... etc... So, nothing useful in this case.

Then, as a longshot, we tried DFS shares instead of standard shares. Needless to say I was very suspect that this would do any good since neither Microsoft nor the Vendor suggest it as a workaround.

However, since implementing DFS shares and pointing our XenApp servers to the DFS shares we have seen a DRAMATIC reduction in the number of errors. To the point that we can now proceed with migrating users off of old Citrix servers.
I don't know if this info will help anyone else with their network-run legacy applications - but - I certainly hope so!

Good Luck!; April 30, 2012 at 2:32 PM
Adrian Hayes said...: Scott - interesting news on the hotfix. I was under the impression that we'd sooner be buying parkas and snow boots for our Hades vacations than the emergence of a hotfix. I'll keep an eye out for that.

I can confirm the performance issues with WEBDAV. In our testing it took 20-30 seconds just to bring up a directory listing on a folder with a handful of files.

Thanks for the update on using DFS paths. I guess that is a workable alternative.; May 1, 2012 at 5:59 PM
Anonymous said...: We have implemented the DFS method with sucess.

We are monitoring.; May 2, 2012 at 6:35 PM
Phillip Salomon said...: Is this an issue only with applications or any file shared on a mapped network drive. We are getting unexpected network errors in general, not just applications (we have moved our Apps to the c:\ folder on every Citrix server).; June 1, 2012 at 11:23 PM
Adrian Hayes said...: Phillip,

No, this issue only affects running binary executable files from a mapped network drive. We did extensive testing with all types of data files and could not duplicate the issues if the program files were loaded locally.

The problem arises because the terminal server is sharing the executable memory space between users. Data would be inappropriate to share outside a user's session. The executable files are the same for everyone and require no privacy between sessions.; June 1, 2012 at 11:48 PM
Adrian Hayes said...: Phillip - one other note. Did you move the programs and just redirect the shortcuts / published applications or did you uninstall and reinstall them? If you did not reinstall, you may have binaries still registered to the network location. If you haven't unmapped the drive, try unmapping it and seeing if the program fails to run or the errors cease.

You can do a definitive test by getting a copy of Procmon from http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx. Run your program and look for any references .exe's or .dll's on your network location. If they're getting accessed from the mapped drive, the bug is still in effect.; June 1, 2012 at 11:54 PM
Phillip Salomon said...: Hi Adrian
We are definately seeing the issue with normal data files, not just executables. Looking at the return value from GetLastError its returning either 59 or 64.

ERROR_UNEXP_NET_ERR 59 (0x3B)
An unexpected network error occurred.
ERROR_NETNAME_DELETED 64 (0x40)
The specified network name is no longer available.

It seems to affect some of our users worse than others. I just tried to close and reopen the file but that has not worked either.; July 18, 2012 at 7:03 AM
Adrian Hayes said...: Phillip - you could be experiencing a problem caused by OpLocks. Check out Jeffry Land's post here. I've had a few instances where disabling Opportunistic Locking has fixed some strange data file issues.

With Win 2008, you will need to disable SMB 2 as well to disable OpLocks. I've used this page as a guide. However, you may see some performance hits if you do this. I would try it as a test. If it doesn't fix the issue, I would re-enable SMB 2 and OpLocks.; July 31, 2012 at 3:16 PM
Adrian Hayes said...: An interesting side note: Microsoft KB 2536487 now lists running from a UNC path as a viable workaround. It previously included only local install or WebDAV as workarounds. However, still no mention of a hotfix.; July 31, 2012 at 3:28 PM
Anonymous said...: They listed UNC as a workaround because it worked for their Global Escalation Tech in a quick and dirty trial.

I used that method with 35 users on 2k8R2/XA6.5 and the issue still presented itself. I had no choice but to roll back all users to gain consistency.

Of course MS wants more logs, but I'm not willing to put my users, nor local IT staff through any more pain in having the issue occur to capture logs.

We're attempting a hotfix request as this is a HUGE setback for our migration - 500+ servers, 25K users.; August 14, 2012 at 12:57 PM
Adrian Hayes said...: Good luck with that hotfix request. :)

We've been very happy with Provisioning server. The farms that we switched over are far more stable and the performance has been great.; August 14, 2012 at 1:34 PM
Anonymous said...: We are going to pilot having the app local to the citrix server, on the d drive (local cache drive), which will allow any updates to the code to persist. This prevents the citrix admins from having to always coordinate a private image with the app team when they need to deploy updates.; August 15, 2012 at 11:20 PM
Wendell said...: Here is a potential workaround that uses the MKLINK command to map a UNC path, which appears to bypass issues with FCB's:

1) run "MKLINK /D [localdir] [\\Server\Share]" to mount a local directory as a network mount-point. If this is all that you need you can stop there, however if you also need a drive letter assigned go on to step #2.

2) Use SUBST to map a drive letter to the mount-point, such as "SUBST M: C:\localdir"

So far seems to work well.; October 23, 2012 at 7:46 PM
Adrian Hayes said...: Interesting. How is performance?; October 23, 2012 at 7:56 PM
Anonymous said...: I have tried the MKLINK and SUBST workaround, but no luck. My application crashed at the first try. The bug is easy to reproduce. I have tested the UNC path too, but with no luck. No fix so far.

We have only 6 terminal servers so i'm going to copy files locally. It's definitely not the solution i'm looking for but it's the only one we have right now.; February 28, 2013 at 10:33 AM
Adrian Hayes said...: Ensure that you don't have registered binaries going back to the mapped network drive.

Get a copy of Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896645) and run it on the terminal server with only one user on the box. Launch your application while capturing disk activity. Filter by path and enter your mapped network drive. If you see the mapped drive on any of the .dll or .exe files, you're still experiencing the FCB bug.; February 28, 2013 at 11:17 AM
Anonymous said...: Does anyone know if this is fixed in server 2012 or has tested it on a 2012 server?; May 28, 2013 at 9:00 AM
Jono said...: I've been testing this for a client. I have a simple setup of a single terminal server with two users, plus a fileserver where our app is installed. Our app has a 'workstation setup' which I ran from a UNC path (this sets up all file paths to unc paths). After this, I cannot reproduce the problem when running the app from the UNC path. If I map a drive to the network location and run from there, I can replicate the issue every time. Even if I leave the drive mapped, and revert to running from UNC, I cannot reproduce the problem. Am I missing something as other user report they still get problems using UNC? I would also like to know if the problem is fully resolved in Server 2012. Anyone know?; June 20, 2013 at 12:28 PM
Adrian Hayes said...: If all your references to executables in the registry point to the UNC path, you should be OK.

For those that still get errors, there are probably still references back to the mapped drive. Procmon from Windows Sysinternals (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) is a great resource to ferret this out. Watch what files the program opens and see if any paths point back to the mapped drive.

Another possibility is that there is another issue coming into play in addition to the FCB bug. By removing the FCB bug crashes, the other errors that happened less frequently will stand out.

I haven't tested on Windows 2012, but my Microsoft engineer said don't expect it anytime soon (he mentioned until after Windows 9, if fixed at all).; June 20, 2013 at 1:10 PM
Jono said...: Just after posting I tried the procmon test and didn't see any references to the mapped drive path. Interestingly, I repeated the procmon test when running the app from the mapped drive. All the entries in the log still showed a UNC path for the executables!

Update - Just added the "Image Path" column *as opposed to "Path" to the procmon output and these do display as the mapped drive.; June 21, 2013 at 4:16 AM
Jono said...: Have just tried the exact same scenario on Windows Server 2012 and NO crash - either using mapped drive or UNC path. So it looks like they did fix it in Server 2012 (either on purpose or by luck!); June 21, 2013 at 11:49 AM
Anonymous said...: Thanks, the Workaround with the DFS worked for me!; September 10, 2013 at 4:04 AM
Anonymous said...: any news for Server 2008R2 Systems. I mean other than DFS or WEBDAV ?

thanks...; November 22, 2013 at 10:15 AM
Anonymous said...: Just for anyone who finds this on the web - we are having major problems with this during our upgrade from 2003:

Server 2012 still has the issue

We tried using webDav for the shares but for some strange reason having more than 50 shares on the terminal server seems to cause any subsequent shares to display a random 'Windows cannot access' your .exe message.

About to give the DFS trick a shot or the UNC one as we are getting desperate.

Both terminal server and file server are Windows 2012 R2; April 24, 2014 at 12:29 AM
Anonymous said...: Switching to a DFS share fixes the problem in our RDS2012 environment. Thanks for the tip!; June 17, 2014 at 8:24 AM
prolianta said...: This comment has been removed by the author.; March 17, 2015 at 3:25 PM
prolianta said...: This comment has been removed by the author.; March 17, 2015 at 3:30 PM
prolianta said...: The theme is still alive ? There is a solution to this problem other than using WebDav and DFS?; March 17, 2015 at 3:31 PM
Adrian Hayes said...: The issue still occurs on Windows 2012. I wouldn't use WebDav unless you like super-slow performance. Install locally on each server, use UNC paths or set up DFS instead.; March 17, 2015 at 4:23 PM
Unknown said...: For Windows Server 2008 R2 Standard we had same issue. The solution was to apply Microsoft Patch >> https://support.microsoft.com/en-us/kb/2732673 That resolved the issue. Fixed!; July 7, 2015 at 5:10 PM
prolianta said...: For Anup Mistry: I installed this patch (2732673) does not help...((; July 7, 2015 at 5:19 PM
Adrian Hayes said...: Anup Mistry - That case looks like it might be for another issue with Outlook. Were you getting the two errors (Event ID 1000 & 1005) in your Application event log?; July 7, 2015 at 7:59 PM
Anonymous said...: Have you vented the Enterprise Rollup? This resolved majority of my issues for Windows 2008 R2 but they're back in Windows 2012 R2.

https://support.microsoft.com/en-us/kb/2775511#/en-us/kb/2775511; November 16, 2015 at 12:11 PM
Anonymous said...: The hint from Wendell solved our issue.

In our case the MKLINK did the trick.
We changed our application from drive letter -> network mount-point.

...Here is a potential workaround that uses the MKLINK command to map a UNC path, which appears to bypass issues with FCB's:

1) run "MKLINK /D [localdir] [\\Server\Share]" to mount a local directory as a network mount-point. ...; June 16, 2016 at 6:07 AM
Anonymous said...: Hi there, MKLINK is still a little suspect to me. Does every application recognize it's a network path? E.g. a virus scanner should scan real local directories and files but not the network path (this one should be scanned by the fileserver, not by all the clients). Do image- or backup-tools include the linked folder? It can contain many gigabytes or even terabytes...

Has anyone made experiences with this?

Greets Maertenz; September 23, 2016 at 7:14 AM
Unknown said...: Looks like the KB now says this issue is fixed in Server 2016.
https://support.microsoft.com/en-us/kb/2536487; October 31, 2016 at 12:56 PM
Adrian Hayes said...: Thanks for the update, Dennis.

Has anyone with Win Server 2016 tested this?; November 3, 2016 at 11:46 AM
Anonymous said...: Hi, I have an application on a terminal server that runs from a mapped drive. My users have been getting a message 'External Exception C0000006' when using features in the program. The MS article said that Server 2016 should resolve the issue, however, we have moved the users to a Server 2016 remote desktop server, and the issue still is present. I just tried disabling SMB2/3/OpLocks in hopes that it will fix the issue. I will post back as to whether it fixes the problem or not. Thanks; June 15, 2017 at 1:43 PM
Anonymous said...: By the way, before we moved the program to Server 2012, and finally Server 2016 (as recommended by Microsoft) it was running on Server 2003, and worked flawlessly. 2000/2003/XP used SMB1 which supposedly did not have Oplocks, so I am hoping that disabling SMB2/3/Oplocks on our new Server 2016 does the trick. Again, will post back if it eliminates the problem.

disable smb2/3 via cmd prompt:
sc config lanmanworkstation depends= bowser/mrxsmb10/nsi
sc config mrxsmb20 start= disabled; June 15, 2017 at 1:47 PM
Adrian Hayes said...: Server 2003 did not have the issue. It was introduced in Server 2008 and later.

The optimization makes sense - why have multiple copies of the same binary in memory when they are all the same. The only problem is they failed to account for when the user that owned the FCB discarded it.; September 20, 2017 at 8:24 PM
kevinthenerd said...: WebDAV is not only slow but is equivalent to a local installation, since it copies files to a local temp directory and then does a very poor job managing which copies should be purged.

C:\Windows\ServiceProfiles\LocalService\AppData\Local\Temp\TfsStore\Tfs_DAV

(WebDAV happens to be the only way I know to upload files to SharePoint Online, so I'm forced--by ignorance or otherwise--to use it daily with some code I wrote.); April 3, 2018 at 10:20 AM
BCas said...: Apparently MS has now changed the resolution in the KB to say all should be fixed in Server 2016, not sure if this is true or not?; May 18, 2018 at 7:47 AM
Anonymous said...: I may have found a clue. I have 2 out of 4 servers getting the C0000006 External Exception error and the one thing I just noticed is the two that are getting the errors are running 2008R2 "Standard" and the 2 that are not getting it are running 2008R2 "Enterprise". I am going to rebuild the 2 that are Standard as Enterprise and see if that fixes the issue.; May 21, 2018 at 11:32 PM

Translate

Wednesday, June 29, 2011

Win 2008 Terminal Server Network App Crashing

58 comments: