Translate

Wednesday, June 29, 2011

Win 2008 Terminal Server Network App Crashing

How many Microsoft Developers does it take to change a lightbulb? None. They just make darkness the new standard.

You can no longer safely run applications from a network location in Windows 2008 & R2 terminal server. When a user opens a program across the network and logs off, it will crash the program for all other users running it on the same server.  It appears that Microsoft is no longer supporting functionality that has been present for 20+ years and is not going to fix the bug due to "Architectural Changes" in Windows 2008.

http://support.microsoft.com/kb/2536487

Here is a summary of the issue:
  1. When a user opens any network application in a 2008 or R2 Terminal Server environment, the OS creates a File Control Block (FCB). The FCB is a handle that the OS uses to access the program file loaded into memory.

  2. If another user on the terminal server opens the same program, the OS will give the second user access to the first user’s FCB and access to that part of the original user’s memory space that stores the executable.

  3. When the first user logs off, all their FCB’s are dropped and become inaccessible to other users that were sharing them.

  4. The next action the remaining users perform in the program fails and it crashes the application because it cannot access the program files.

You can see this reported in the Application event log on two entries with the same time stamp:
  • Application Error, Event ID 1000 – “Faulting application xxxxx.exe, version xxxxx…”

  • Application Error, Event ID 1005 – “Windows cannot access the file for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program XXXXX because of this error.”
Microsoft says the "New Standard" is to load applications locally or use a WebDAV share. Local install works ok in one or two-server environments, not so well in 400+ server environments. Applying program updates to multiple servers is an administrative nightmare and a waste of resources as your environment grows. And WebDAV? Really?

We've gotten past Microsoft's Offshore Support Defense Forces and we're finally in discussions with high-level US engineers at Microsoft. We're attempting to convince them of the gravity of the issue and to resolve it. The engineers have acknowledged that they have received multiple reports of the issue but that MS development is refusing to fix it.

They said that the bug is coded deep in the 2008 OS and would require architectural redesign, so they are reluctant to make any changes. It looks like they tried to fix it in 2008 R2, because it now shares the FCB of the last user who opened the program file instead of the first, but it still crashes when they log off before another user grabs the ball.

It seems like they forked in the wrong direction and locked themselves in a faulty design.

Update - 2016.11.03 - The Microsoft KB article 2536487 states that Windows Server 2016 fixes this issue.  If you have installed Windows 2016 and verified that this is fixed, I would love to hear from you in the comments.  Thanks!

Update - 2012.05.01 - Multiple responders have reported that accessing the network files over a DFS share eliminates the FCB errors.  See comments below.

Update - 2011.09.23 - We have had some success running the programs from a UNC path instead of using the mapped network drive.  This is still in testing, but I believe you may also need to remove the mapped drive to eliminate the application crashes. If this works in your environment, please post back in the comments.

55 comments:

Anonymous said...

I'm with you 100%. I'm experiencing this issue supporting customers running my application from a Network Share and I've already notified them to not use Windows 2008 and 2008 R2.

Adrian Hayes said...

We've seen some mixed success with running the applications via UNC path instead of a mapped drive. I'm not sure if Windows shares FCB's when running via UNC.

I haven't nailed it down yet, but I believe it may succeed if you no longer have a mapped drive on the same network share.

Anonymous said...

We are experiencing this exact issue on a Windows 2008 TS/Xenapp server running applications from a network share on a Windows 2003 R1 server.

We will try changing the shortcut to a UNC path to see if that resolves the issue, but we dont have the option to remove the mapped drive as it will impact too many other applications.

Adrian Hayes said...

It's pretty easy to definitively test to see if it's resolved:

1) Log in as User 1 and launch the program from the UNC path. Use it for a little while - enough to load several binaries into memory

2) Log in as User 2 and launch the same program from the UNC path. Again, run a while.

3) If on 2008 R2, log off User 2. If on Windows 2008, non-R2, log off User 1.

4) Attempt to use the program on the remaining user session.

I find that it often takes a while to duplicate the application crash. It doesn't occur until the program needs to read from a binary (.exe, .dll, etc.) that it hasn't yet loaded into memory on the current session but had been accessed on the logged off session.

Phil Salomon said...

Hi. We have just implemented a Citrix farm and this is one of the problems we experienced.

Can you post again if you get anywhere with Microsoft.

Adrian Hayes said...

Phil,

I'm not holding my breath for a hotfix. Even if one were forthcoming, our Enterprise contact told us it would be at least Windows 9 (read: 5-7 years) before it was implemented. MS has pinned their performance improvements on the features that cause this issue.

We're mitigating the issue by implementing Citrix Provisioning Server to push software updates out to our servers. It's not pretty, but it will come with the side benefit of reducing the load on our filers.

Anonymous said...

One workaround to this is to create multiple shares for the same physical path and assign each share to a single user. Ex. physical path F:\SHARED is being shared as \\FS\USER1 and \\FS\USER2 USER1 maps X: drive to \\FS\USER1 while USER2 maps X: drive to \\FS\USER2. Both USER1 and USER2 will access the same files and folders. The question here is what is the maximum limit of shares per folder. This has been tested and worked without problem for two users but the question is what will happen for 100 or more users.

Mike S. said...

I have unsuccessfuly found a hotfix or Windows Update to fix this issue (seen sporadically). Does anyone else have an update on this?

Anonymous said...

Hi,
if you must continue to use your apps through a network drive in Terminal Services on Windows 2008, you can use DFS as explained here: http://social.technet.microsoft.com/Forums/en-US/winserverTS/thread/d7bab29e-2bce-4785-9abb-259cff3a153e?prof=required

Carlo

Adrian Hayes said...

Carlo - very interesting. I'll have to try that. Thanks.

smicke said...

I'm using UNC-path at the moment and it works ok, but nog 100%. Has anyone heard something more about this issue?

Anonymous said...

I believe that we are experiencing the same issue in our environment. I know that it has been asked, but what kind of results are we getting with the UNC path?

Anonymous said...

Just some follow-up questions:

1. Is the issue caused by the closing of the network app or logging off of the RDP session?

2. Does it matter if the current FCB owner closes out the network app cleanly before logging off?

Adrian Hayes said...

What happens if you add your server to the Local Intranet zone in Internet Options? I've had that resolve some performance issues.

AFAIK, the holder of the FCB maintains it even if they close the file. I would imagine it would get unloaded from memory only if there are no further references to it by other users. However, I have not tested this scenario.

The issue triggers when the user who is currently holding the FCB logs off.

Paul Jeffries said...

I moved the application so it ran from C:\ rather than from a mapped drive and have seen a significant improvement.

Scott said...

This issue has been causing us nothing but grief as well where I work...

A network application that has worked great for years now throws fits when set up on our new Windows 2008 (32-Bit)/XenApp Server environment. The application is quite old but a critical component of the business; so we had to accommodate it (including using 2008 rather than 2008 R2 unfortunately).

After a lot of fix attempts, numerous chats with the applications vendor, it was clear it was a known issue and their workaround was messy requiring us to copy exe's to a local folder and then use old DOS-based SUBST commands to point to it.
We then found this MS KB article:

http://support.microsoft.com/kb/2536487

WebDAV? Really? It was impractical and TERRIBLY slow (NOT WORKABLE). We opened an incident with Microsoft, we got a canned response that there was a HotFix. Only to discover there was NO hotfix. Only to discover they were internally TESTING a Hotfix.. etc... etc... So, nothing useful in this case.

Then, as a longshot, we tried DFS shares instead of standard shares. Needless to say I was very suspect that this would do any good since neither Microsoft nor the Vendor suggest it as a workaround.

However, since implementing DFS shares and pointing our XenApp servers to the DFS shares we have seen a DRAMATIC reduction in the number of errors. To the point that we can now proceed with migrating users off of old Citrix servers.
I don't know if this info will help anyone else with their network-run legacy applications - but - I certainly hope so!

Good Luck!

Adrian Hayes said...

Scott - interesting news on the hotfix. I was under the impression that we'd sooner be buying parkas and snow boots for our Hades vacations than the emergence of a hotfix. I'll keep an eye out for that.

I can confirm the performance issues with WEBDAV. In our testing it took 20-30 seconds just to bring up a directory listing on a folder with a handful of files.

Thanks for the update on using DFS paths. I guess that is a workable alternative.

Anonymous said...

We have implemented the DFS method with sucess.

We are monitoring.

Phillip Salomon said...

Is this an issue only with applications or any file shared on a mapped network drive. We are getting unexpected network errors in general, not just applications (we have moved our Apps to the c:\ folder on every Citrix server).

Adrian Hayes said...

Phillip,

No, this issue only affects running binary executable files from a mapped network drive. We did extensive testing with all types of data files and could not duplicate the issues if the program files were loaded locally.

The problem arises because the terminal server is sharing the executable memory space between users. Data would be inappropriate to share outside a user's session. The executable files are the same for everyone and require no privacy between sessions.

Adrian Hayes said...

Phillip - one other note. Did you move the programs and just redirect the shortcuts / published applications or did you uninstall and reinstall them? If you did not reinstall, you may have binaries still registered to the network location. If you haven't unmapped the drive, try unmapping it and seeing if the program fails to run or the errors cease.

You can do a definitive test by getting a copy of Procmon from http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx. Run your program and look for any references .exe's or .dll's on your network location. If they're getting accessed from the mapped drive, the bug is still in effect.

Phillip Salomon said...

Hi Adrian
We are definately seeing the issue with normal data files, not just executables. Looking at the return value from GetLastError its returning either 59 or 64.

ERROR_UNEXP_NET_ERR 59 (0x3B)
An unexpected network error occurred.
ERROR_NETNAME_DELETED 64 (0x40)
The specified network name is no longer available.

It seems to affect some of our users worse than others. I just tried to close and reopen the file but that has not worked either.

Adrian Hayes said...

Phillip - you could be experiencing a problem caused by OpLocks. Check out Jeffry Land's post here. I've had a few instances where disabling Opportunistic Locking has fixed some strange data file issues.

With Win 2008, you will need to disable SMB 2 as well to disable OpLocks. I've used this page as a guide. However, you may see some performance hits if you do this. I would try it as a test. If it doesn't fix the issue, I would re-enable SMB 2 and OpLocks.

Adrian Hayes said...

An interesting side note: Microsoft KB 2536487 now lists running from a UNC path as a viable workaround. It previously included only local install or WebDAV as workarounds. However, still no mention of a hotfix.

Anonymous said...

They listed UNC as a workaround because it worked for their Global Escalation Tech in a quick and dirty trial.

I used that method with 35 users on 2k8R2/XA6.5 and the issue still presented itself. I had no choice but to roll back all users to gain consistency.

Of course MS wants more logs, but I'm not willing to put my users, nor local IT staff through any more pain in having the issue occur to capture logs.

We're attempting a hotfix request as this is a HUGE setback for our migration - 500+ servers, 25K users.

Adrian Hayes said...

Good luck with that hotfix request. :)

We've been very happy with Provisioning server. The farms that we switched over are far more stable and the performance has been great.

Anonymous said...

We are going to pilot having the app local to the citrix server, on the d drive (local cache drive), which will allow any updates to the code to persist. This prevents the citrix admins from having to always coordinate a private image with the app team when they need to deploy updates.

Wendell said...

Here is a potential workaround that uses the MKLINK command to map a UNC path, which appears to bypass issues with FCB's:

1) run "MKLINK /D [localdir] [\\Server\Share]" to mount a local directory as a network mount-point. If this is all that you need you can stop there, however if you also need a drive letter assigned go on to step #2.

2) Use SUBST to map a drive letter to the mount-point, such as "SUBST M: C:\localdir"

So far seems to work well.

Adrian Hayes said...

Interesting. How is performance?

Anonymous said...

I have tried the MKLINK and SUBST workaround, but no luck. My application crashed at the first try. The bug is easy to reproduce. I have tested the UNC path too, but with no luck. No fix so far.

We have only 6 terminal servers so i'm going to copy files locally. It's definitely not the solution i'm looking for but it's the only one we have right now.

Adrian Hayes said...

Ensure that you don't have registered binaries going back to the mapped network drive.

Get a copy of Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896645) and run it on the terminal server with only one user on the box. Launch your application while capturing disk activity. Filter by path and enter your mapped network drive. If you see the mapped drive on any of the .dll or .exe files, you're still experiencing the FCB bug.

Anonymous said...

Does anyone know if this is fixed in server 2012 or has tested it on a 2012 server?

Jono said...

I've been testing this for a client. I have a simple setup of a single terminal server with two users, plus a fileserver where our app is installed. Our app has a 'workstation setup' which I ran from a UNC path (this sets up all file paths to unc paths). After this, I cannot reproduce the problem when running the app from the UNC path. If I map a drive to the network location and run from there, I can replicate the issue every time. Even if I leave the drive mapped, and revert to running from UNC, I cannot reproduce the problem. Am I missing something as other user report they still get problems using UNC? I would also like to know if the problem is fully resolved in Server 2012. Anyone know?

Adrian Hayes said...

If all your references to executables in the registry point to the UNC path, you should be OK.

For those that still get errors, there are probably still references back to the mapped drive. Procmon from Windows Sysinternals (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) is a great resource to ferret this out. Watch what files the program opens and see if any paths point back to the mapped drive.

Another possibility is that there is another issue coming into play in addition to the FCB bug. By removing the FCB bug crashes, the other errors that happened less frequently will stand out.

I haven't tested on Windows 2012, but my Microsoft engineer said don't expect it anytime soon (he mentioned until after Windows 9, if fixed at all).

Jono said...

Just after posting I tried the procmon test and didn't see any references to the mapped drive path. Interestingly, I repeated the procmon test when running the app from the mapped drive. All the entries in the log still showed a UNC path for the executables!

Update - Just added the "Image Path" column *as opposed to "Path" to the procmon output and these do display as the mapped drive.

Jono said...

Have just tried the exact same scenario on Windows Server 2012 and NO crash - either using mapped drive or UNC path. So it looks like they did fix it in Server 2012 (either on purpose or by luck!)

Anonymous said...

Thanks, the Workaround with the DFS worked for me!

Anonymous said...

any news for Server 2008R2 Systems. I mean other than DFS or WEBDAV ?

thanks...

Anonymous said...

Just for anyone who finds this on the web - we are having major problems with this during our upgrade from 2003:

Server 2012 still has the issue

We tried using webDav for the shares but for some strange reason having more than 50 shares on the terminal server seems to cause any subsequent shares to display a random 'Windows cannot access' your .exe message.

About to give the DFS trick a shot or the UNC one as we are getting desperate.

Both terminal server and file server are Windows 2012 R2

Anonymous said...

Switching to a DFS share fixes the problem in our RDS2012 environment. Thanks for the tip!

prolianta said...
This comment has been removed by the author.
prolianta said...
This comment has been removed by the author.
prolianta said...

The theme is still alive ? There is a solution to this problem other than using WebDav and DFS?

Adrian Hayes said...

The issue still occurs on Windows 2012. I wouldn't use WebDav unless you like super-slow performance. Install locally on each server, use UNC paths or set up DFS instead.

Anup Mistry said...

For Windows Server 2008 R2 Standard we had same issue. The solution was to apply Microsoft Patch >> https://support.microsoft.com/en-us/kb/2732673 That resolved the issue. Fixed!

prolianta said...

For Anup Mistry: I installed this patch (2732673) does not help...((

Adrian Hayes said...

Anup Mistry - That case looks like it might be for another issue with Outlook. Were you getting the two errors (Event ID 1000 & 1005) in your Application event log?

Anonymous said...

Have you vented the Enterprise Rollup? This resolved majority of my issues for Windows 2008 R2 but they're back in Windows 2012 R2.

https://support.microsoft.com/en-us/kb/2775511#/en-us/kb/2775511

Anonymous said...

The hint from Wendell solved our issue.

In our case the MKLINK did the trick.
We changed our application from drive letter -> network mount-point.

...Here is a potential workaround that uses the MKLINK command to map a UNC path, which appears to bypass issues with FCB's:

1) run "MKLINK /D [localdir] [\\Server\Share]" to mount a local directory as a network mount-point. ...

Anonymous said...

Hi there, MKLINK is still a little suspect to me. Does every application recognize it's a network path? E.g. a virus scanner should scan real local directories and files but not the network path (this one should be scanned by the fileserver, not by all the clients). Do image- or backup-tools include the linked folder? It can contain many gigabytes or even terabytes...

Has anyone made experiences with this?

Greets Maertenz

Dennis Peabody said...

Looks like the KB now says this issue is fixed in Server 2016.
https://support.microsoft.com/en-us/kb/2536487

Adrian Hayes said...

Thanks for the update, Dennis.

Has anyone with Win Server 2016 tested this?

Anonymous said...

Hi, I have an application on a terminal server that runs from a mapped drive. My users have been getting a message 'External Exception C0000006' when using features in the program. The MS article said that Server 2016 should resolve the issue, however, we have moved the users to a Server 2016 remote desktop server, and the issue still is present. I just tried disabling SMB2/3/OpLocks in hopes that it will fix the issue. I will post back as to whether it fixes the problem or not. Thanks

Anonymous said...

By the way, before we moved the program to Server 2012, and finally Server 2016 (as recommended by Microsoft) it was running on Server 2003, and worked flawlessly. 2000/2003/XP used SMB1 which supposedly did not have Oplocks, so I am hoping that disabling SMB2/3/Oplocks on our new Server 2016 does the trick. Again, will post back if it eliminates the problem.

disable smb2/3 via cmd prompt:
sc config lanmanworkstation depends= bowser/mrxsmb10/nsi
sc config mrxsmb20 start= disabled

Adrian Hayes said...

Server 2003 did not have the issue. It was introduced in Server 2008 and later.

The optimization makes sense - why have multiple copies of the same binary in memory when they are all the same. The only problem is they failed to account for when the user that owned the FCB discarded it.