Tip: TCP idle timeout settings for Azure Load Balancer and IEE Client

5 years ago
December 10, 2019
10 replies
2625 views

+19

Minoshini Fonseka
Superhero (Employee)
233 replies

Recently we came across a situation where customers have noted a significant performance behaviours based on how a BA report is run. If a report is scheduled, it happens to run within an acceptable time frame, but on the contrary it becomes laggard if it is ordered from Order Report functionality. Literally users had to experience hours of hanging up in IFS IEE Client.

When investigating further it was noticed that there are different behaviors in customer’s network and support net. Besides the problem is not pertinent to BA reports but for other application flows as well. We observed this trend among managed services customers.

Problem

When the client call is sent from application, it goes to the middleware server and then to the database. Then the response is returned from the database, to the middleware server, but there is a loss in the communication of middleware server call to client. Due to that reason, we initiate a further investigation for network configuration why the response from client to middleware server behaves differently when public vs VPN access is used. This does not occur in support net as well.

Client to Middleware server
Succeeds -> The DB gets called
Middleware Server to DB
Succeeds -> The DB call is ended
Response from Middleware Server to Client
Fails (when accessed via public network)

Furthermore, this issue does not occur when the report is scheduled, because there is no need to return the response to the client. Therefore, we were able to narrow down that there is a hassle in communication channel between the middleware server and the client.

Solution

Azure configuration for the public IP of the VM was discovered as the reason for this issue. This was the TCP idle timeout settings for Azure Load Balancer which was setup for 4 minutes.

Once we tried to increase that value to 30 minutes and test the report again from a public network, it worked perfectly.

https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-idle-timeout

Suggestions for Improvements

If the TCP idle timeout is exceeded, IFS application started to hang up, during the communication between the Middleware Server to Client. Although the IEE client waits for the response, there has been a lost due to the TCP idle timeout settings for Azure Load Balancer.

May be .net access provider could not identify the response of time out, or that response is dropped. Introducing a 'keep-alive' message from Application server, to the client, in a way the connection won't be timed out while the client is waiting for the report.

Summary

Once the TCP idle timeout settings for Azure Load Balancer is exceeded, the whole IFS application gets stuck without any timeout messages.

The maximum timeout which could be set is 30 minutes. If there is an operation going to be running more than 30 minutes (Operations which return to the client), IFS application will hang up. Therefore, this would be problematic to customers who are running IFS on Azure on reliability aspects. For now, we could survive with scheduling the report or any functional flow if it is possible. To conclude we could suggest this would be valid point in reviewing the existing architecture in future releases.

Did this topic help you find an answer to your question?

+22

dsj
Ultimate Hero (Partner)
880 replies
5 years ago
December 11, 2019

Hi Mino

Thanks for sharing this and for the detailed explanation :)

We have observed the issue in other load balancers such as Netscaler and F5 which has similar idle timeout setting so it’s not only with Azure.

Attached zip contains quick test to identify the issue and hope it will be useful.

It adds a custom menu in Customer window which will start a 900 sec active db call and if you are connecting with a load balancer with idle timeout less than this, this RMB will never finish 😉

Hope it helps!

1 Attachments

idle_timeout_test.zip

DSJ - https://dsj23.me

+19

Minoshini Fonseka
Author
Superhero (Employee)
233 replies
5 years ago
December 12, 2019

dsj wrote:

Hi Mino

Thanks for sharing this and for the detailed explanation 🙂.We have observed the issue in other load balancers such as Netscaler and F5 which has similar idle timeout setting so it’s not only with Azure.

Attached zip contains quick test to identify the issue and hope it will be useful.

It adds a custom menu in Customer window which will start a 900 sec active db call and if you are connecting with a load balancer with idle timeout less than this, this RMB will never finish 😉

Hope it helps!

@dsj
Hello Daji ,

Thank you very much for the thought provoking feedback. Well.. then we could categorized this as a general phenomenon for load balancers.

Besides I tried your custom menu code to identify whether the db call returns back to the client. It’s a good indication to use in other scenarios as well. I highly appreciate that :relaxed: .

/Mino

/Mino

+18

RutJWhalen
Superhero (Partner)
367 replies
4 years ago
April 26, 2021

@Minoshini Fonseka @dsj

Thank you very much for the information and the scripts.

We are using Azure for our servers and have a WAF setup. I do not see any Load Balancers set up so am unsure of which setting to amend because of this.

@dsj I used your scripts and like you specified the RMB hung even though 20 minutes have passed.

Any ideas?

necyingxz
Sidekick (Partner)
18 replies
4 years ago
April 28, 2021

@Minoshini Fonseka

Hello, I would like to confirm the detail cluster structure in this picture.

although this topic been posted for 1 year, I would very appreciate if you can give the answers!

the “Server 1” and “Server 2” are Oracle DB?
between “Client” and “Azure Load Balancer”, is there only one middle ware server?

Thank you

Yingxin

+22

dsj
Ultimate Hero (Partner)
880 replies
4 years ago
April 28, 2021

johnw66 wrote:

@Minoshini Fonseka @dsj

Thank you very much for the information and the scripts.

We are using Azure for our servers and have a WAF setup. I do not see any Load Balancers set up so am unsure of which setting to amend because of this.

@dsj I used your scripts and like you specified the RMB hung even though 20 minutes have passed.

Any ideas?

Hi @johnw66 ,

I haven’t involved with a WAF setup but I assume you have a WAF on Application gateway? Application Gateway works with HTTP(s) rules and probably the request timeout in HTTP settings apply here.

Azure Application Gateway HTTP settings configuration | Microsoft Docs

If have a load balancer between IFS App server and Application Gateway, then the TCP timeout of that load balancer also needs to be checked. More information on that topic is described above.

Hope it helps!

Damith

DSJ - https://dsj23.me

+19

Minoshini Fonseka
Author
Superhero (Employee)
233 replies
4 years ago
April 30, 2021

necyingxz wrote:

@Minoshini Fonseka

Hello, I would like to confirm the detail cluster structure in this picture.

although this topic been posted for 1 year, I would very appreciate if you can give the answers!

the “Server 1” and “Server 2” are Oracle DB?
between “Client” and “Azure Load Balancer”, is there only one middle ware server?

Thank you

Yingxin

Hi @necyingxz,

Thank you for your reply.

If I explain you on this diagram, this is a figure which explains the architecture up to the middleware server from the client level.

In general scenarios, a call would go in following way
Request : Client → Middleware server → Database
|
Processing
|
Client ← Middleware server ← Database : Response

On occasions of load balancer is integrated, it’s between the client and the application server. Load balancer must be used in front of the cluster in order to distribute the load among the cluster nodes. That is why the load balancer comes in the middle of the client and application servers, as explained in this diagram.

Further I would like to add IFS architectural diagram which will be useful to understand the complete overview.

Answering your questions:

the “Server 1” and “Server 2” are Oracle DB?
Answer : Server 1 and Server 2 are application server to which the load is distributed by the load balancer.
between “Client” and “Azure Load Balancer”, is there only one middle ware server?
Answer : Load balancer is between the client and the middleware server.

I hope this clarifies your concerns. Thank you.

/Mino

necyingxz
Sidekick (Partner)
18 replies
4 years ago
May 6, 2021

Minoshini Fonseka wrote:

necyingxz wrote:

@Minoshini Fonseka

Hello, I would like to confirm the detail cluster structure in this picture.

although this topic been posted for 1 year, I would very appreciate if you can give the answers!

the “Server 1” and “Server 2” are Oracle DB?
between “Client” and “Azure Load Balancer”, is there only one middle ware server?

Thank you

Yingxin

Further I would like to add IFS architectural diagram which will be useful to understand the complete overview.

Answering your questions:

the “Server 1” and “Server 2” are Oracle DB?
Answer : Server 1 and Server 2 are application server to which the load is distributed by the load balancer.
between “Client” and “Azure Load Balancer”, is there only one middle ware server?
Answer : Load balancer is between the client and the middleware server.

I hope this clarifies your concerns. Thank you.

Hi @Minoshini Fonseka very appreciate for the explanations.

Actually, we got a similar issue that I have asked by other post:

Client lost response after running an ACP importing for more than 40 mints when using Azure Application Gateway and Azure LB environment

Could you have a look at it and give me some advices?

Thank you!

+17

Pilar Franco
Superhero (Employee)
367 replies
3 years ago
May 24, 2022

Hi,

What happens when the process takes longer than 30 minutes?. I have the same problem with a customer that receives serials in a purchase order. With supportnet connection, it takes around 40 minutes, but customer is directly connected. So then, what’s the option here?

Regards,

Pilar

+19

Minoshini Fonseka
Author
Superhero (Employee)
233 replies
3 years ago
May 25, 2022

@Pilar Franco :
First, from the database level, you can verify approximately how much of time it takes to execute that flow. If it is more than 30 minutes, either the performance of that flow should be improved product wise or it could be schedule to run as a background job if it is possible according to the application flow.

/Mino

+17

Pilar Franco
Superhero (Employee)
367 replies
3 years ago
June 6, 2022

Thanks MInoshini!

That’s indeed what we are currently working on. In customer’s environment it takes more than 30 minutes, so requested RnD to enable this process to be run in background too (Purchase receipt of 1500 serials).

We are also analyzing why this process it’s taking so long in customer’s environment.

Regards,

Pilar

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos + marketing

Problem

Solution

Suggestions for Improvements

Summary

1 Attachments

Reply

Related Topics

IFS Cloud - IFS Reports increase time outicon

Client lost response after 4 mints process run when using Azure Application Gateway and Azure LB environmenticon

Client lost response after running an ACP importing for more than 40 mints when using Azure Application Gateway and Azure LB environmenticon

Error: An existing connection was forcibly closed by the remote hosticon

HttpServer1.log consumes all storage spaceicon

Did you find what you're looking for? If not:

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings