Wow, does this ever suck.
Ok, so here's the problem. I have a service that takes an XML file, which contains a list of parts from another system, and I need to synchronize the list of parts with our system. In order to do this, the service loops through the list of parts, and for each one, it attempts to look up the part in our system. If it exists, it gets it, and updates its properties with data from the file. If it doesn't, it creates a new one, with properties from the file. It gathers all these creations/changes into a collection, and then saves the batch to the database.
Seems pretty straightforward, right? The catch is, the interface to our system is through an authenticated web service.
Well, gee, that's not so tough, right? After all, in .Net, you just create a reference to the web service, and it does most of the work for you, creating all the wrapper classes and so forth. And, most of the time, it's as simple as that.
The problem, though, is that in this sample data set, we had about 5,000 parts. For each part, a query had to be made to the web service: Give me this part. (It'll return either the part, if it exists, or a null, meaning it doesn't, and I have to create one.) And for some reason, after about 3800 or 3900 calls to the web service in rapid succession, it would just quit. "Unable to connect to web service." The inner exception revealed a little more detail: "Only one usage of each socket address (protocol/network address/port) is normally permitted."
Huh?
After considerable digging and googling, I finally unearthed this blog post by Durgaprasad Gorti, which reveals the problem. An authenticated call closes the connection, but the Windows TCP stack holds the socket in a "TIME_WAIT" state for four minutes by default before it can be reused. While he does offer a registry hack to tell Windows to cut that time shorter, I wanted to find a way to do it in code, so it's one less variable to keep track of on a client's machine.
Unfortunately, all of my experimentation proved fruitless. No matter how I played with the ServicePoint, trying to forcibly close it, setting its timeouts to minimum values, whatever, the sockets stayed open too long.
So much for trying to out-think the Microsoft guy.
His code-based solution, therefore, is the one I'm using. Unfortunately, it's not great in that it basically just delays the problem — by expanding the range of sockets it can use, instead of crashing in under 4,000 calls, the limit is raised to 60,000.
He gives the basics of how to implement it, but unfortunately he doesn't indicate where the code needed to go. Fortunately, I found another blog post, by Kamil Pakur, that gave me just the clue I needed. (Incidentally, he's trying to solve the same problem, but his solution — forcing the KeepAlive to false and the HTTP protocol version to 1.0 — didn't change anything in my scenario; it still crashed in under 4,000 calls. In fact, it would seem that KeepAlive=false, which is automatic in an authenticated scenario, is the source of the problem.)
So, here's what I did:
- Copied the "
namespace
" and "public partial class
" lines from the auto-generated Reference.cs
file representing my web service into a new code file.
- In that file, copied Gorti's
public static IPEndPoint BindIPEndPointCallback
method.
- Added to that file a
protected static int m_LastBindPortUsed = 5001;
line (which is used in the "BindIPEndPointCallback
" method).
- Added a method to override the service's
GetWebRequest
event that set the ServicePoint.BindIPEndPointDelegate
to the BindIPEndPointCallback
method (the first line on Gorti's code block).
My entire class file looks a lot like this:
namespace ProjectName.ServiceName
{
public partial class Service : System.Web.Services.Protocols.SoapHttpClientProtocol
{
protected override System.Net.WebRequest GetWebRequest(Uri uri) {
System.Net.HttpWebRequest webRequest = (System.Net.HttpWebRequest)base.GetWebRequest(uri);
webRequest.ServicePoint.BindIPEndPointDelegate = new System.Net.BindIPEndPoint(BindIPEndpointCallback);
return webRequest;
}
//protected override System.Net.WebResponse GetWebResponse(System.Net.WebRequest request) {
// if (request is System.Net.HttpWebRequest) {
// System.Net.HttpWebRequest httpRequest = (System.Net.HttpWebRequest)request;
// System.Net.WebResponse response = base.GetWebResponse(httpRequest);
// httpRequest.ServicePoint.MaxIdleTime = 1;
// httpRequest.ServicePoint.ConnectionLeaseTimeout = 1;
// httpRequest.ServicePoint.CloseConnectionGroup(httpRequest.ServicePoint.ConnectionName);
// return response;
// } else
// return base.GetWebResponse(request);
//}
protected static int m_LastBindPortUsed = 5001;
public static System.Net.IPEndPoint BindIPEndpointCallback(
System.Net.ServicePoint servicePoint,
System.Net.IPEndPoint remoteEndPoint,
int retryCount) {
int port = System.Threading.Interlocked.Increment(ref m_LastBindPortUsed);
System.Threading.Interlocked.CompareExchange(ref m_LastBindPortUsed, 5001, 65534);
if (remoteEndPoint.AddressFamily == System.Net.Sockets.AddressFamily.InterNetwork) {
return new System.Net.IPEndPoint(System.Net.IPAddress.Any, port);
} else {
return new System.Net.IPEndPoint(System.Net.IPAddress.IPv6Any, port);
}
}
}
}
(I left in the code for the GetWebResponse
override, just so you can see some of the things I tried to clear up the sockets. It's all commented out now, of course, because it, quite simply, just doesn't do a blasted thing.)
The service now completes the run on our test data. However, I'm still not comfortable with the solution. There is an upper limit to the amount of data that it can process at a time.
Maybe a better solution is to ship the file across to the web service and do the processing there. (It would certainly make for a cleaner interface, an actual web service that does tasks, instead of the glorified and bloated data access layer we have now.) But that, too, is a double-edged sword. Web services have limits on how much data can be shipped, plus timeouts on how long the client will wait for a response.
What really surprised me in researching this is how little information there was on this problem. I guess calling an authenticated web service in a loop isn't a common scenario. It was only when I googled the text of the inner exception that I found it, amongst a lot of results pointing to people actually opening a lot of TCP connections manually.
I did come across a few posts of people complaining about web services failing in a loop, but the responses (when there were any given) were nowhere near the actual solution (suggesting a timeout issue with the session or authentication cookies). Maybe this post will be of more help, since I tried to bring the problem and solution together.