Tip: TCP idle timeout settings for Azure Load Balancer and IEE Client

  • 10 December 2019
  • 10 replies
  • 2306 views

Userlevel 7
Badge +19

Recently we came across a situation where customers have noted a significant performance behaviours based on how a BA report is run. If a report is scheduled, it happens to run within an acceptable time frame, but on the contrary it becomes laggard if it is ordered from Order Report functionality. Literally users had to experience hours of hanging up in IFS IEE Client.

 

When investigating further it was noticed that there are different behaviors in customer’s network and support net. Besides the problem is not pertinent to BA reports but for other application flows as well. We observed this trend among managed services customers. 

 

Problem

When the client call is sent from application, it goes to the middleware server and then to the database. Then the response is returned from the database, to the middleware server, but there is a loss in the communication of middleware server call to client. Due to that reason, we initiate a further investigation for network configuration why the response from client to middleware server behaves differently when public vs VPN access is used. This does not occur in support net as well.

  • Client to Middleware server
    Succeeds -> The DB gets called :ballot_box_with_check:
     
  • Middleware Server to DB
    Succeeds -> The DB call is ended :ballot_box_with_check:
     
  • Response from Middleware Server to Client
    Fails (when accessed via public network) :negative_squared_cross_mark:

Furthermore, this issue does not occur when the report is scheduled, because there is no need to return the response to the client. Therefore, we were able to narrow down that there is a hassle in communication channel between the middleware server and the client.

 

Solution

Azure configuration for the public IP of the VM was discovered as the reason for this issue. This was the TCP idle timeout settings for Azure Load Balancer which was setup for 4 minutes.

Figure 1 : TCP Idle Timeout

Once we tried to increase that value to 30 minutes and test the report again from a public network, it worked perfectly.

https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-idle-timeout

 

Suggestions for Improvements

If the TCP idle timeout is exceeded, IFS application started to hang up, during the communication between the Middleware Server to Client. Although the IEE client waits for the response, there has been a lost due to the TCP idle timeout settings for Azure Load Balancer.


May be .net access provider could not identify the response of time out, or that response is dropped. Introducing a 'keep-alive' message from Application server, to the client, in a way the connection won't be timed out while the client is waiting for the report.

 

Summary

Once the TCP idle timeout settings for Azure Load Balancer is exceeded, the whole IFS application gets stuck without any timeout messages. 

The maximum timeout which could be set is 30 minutes.  If there is an operation going to be running more than 30 minutes (Operations which return to the client), IFS application will hang up. Therefore, this would be problematic to customers who are running IFS on Azure on reliability aspects. For now, we could survive with scheduling the report or any functional flow if it is possible. To conclude we could suggest this would be valid point in reviewing the existing architecture in future releases.  

 


10 replies

Userlevel 7
Badge +20

Hi Mino

 

Thanks for sharing this and for the detailed explanation :)

We have observed the issue in other load balancers such as Netscaler and F5 which has similar idle timeout setting so it’s not only with Azure.

Attached zip contains quick test to identify the issue and hope it will be useful.

It adds a custom menu in Customer window which will start a 900 sec active db call and if you are connecting with a load balancer with idle timeout less than this, this RMB will never finish 😉

 

Hope it helps!

Userlevel 7
Badge +19

Hi Mino

 

Thanks for sharing this and for the detailed explanation 🙂.We have observed the issue in other load balancers such as Netscaler and F5 which has similar idle timeout setting so it’s not only with Azure.

Attached zip contains quick test to identify the issue and hope it will be useful.

It adds a custom menu in Customer window which will start a 900 sec active db call and if you are connecting with a load balancer with idle timeout less than this, this RMB will never finish 😉

 

Hope it helps!


 @dsj 
 Hello Daji ,
     
Thank you very much for the thought provoking feedback.  Well.. then we could categorized this as a general phenomenon for load balancers. 

Besides I tried your custom menu code to identify whether the db call returns back to the client. It’s a good indication to use in other scenarios as well. I highly appreciate that :relaxed: . 
  
/Mino
 

Userlevel 7
Badge +18

@Minoshini Fonseka @dsj 

Thank you very much for the information and the scripts.

We are using Azure for our servers and have a WAF setup.  I do not see any Load Balancers set up so am unsure of which setting to amend because of this.

 

@dsj I used your scripts and like you specified the RMB  hung even though 20 minutes have passed.

 

Any ideas?

Userlevel 3
Badge +3

@Minoshini Fonseka 

Hello, I would like to confirm the detail cluster structure in this picture.

although this topic been posted for 1 year, I would very appreciate if you can give the answers!  

  1. the “Server 1” and “Server 2” are Oracle DB?
  2. between “Client” and “Azure Load Balancer”, is there only one middle ware server?

 

Thank you

Yingxin 

Userlevel 7
Badge +20

@Minoshini Fonseka @dsj 

Thank you very much for the information and the scripts.

We are using Azure for our servers and have a WAF setup.  I do not see any Load Balancers set up so am unsure of which setting to amend because of this.

 

@dsj I used your scripts and like you specified the RMB  hung even though 20 minutes have passed.

 

Any ideas?

Hi @johnw66 ,

 

I haven’t involved with a WAF setup but I assume you have a WAF on Application gateway? Application Gateway works with HTTP(s) rules and probably the request timeout in HTTP settings apply here.

Azure Application Gateway HTTP settings configuration | Microsoft Docs

If have a load balancer between IFS App server and Application Gateway, then the TCP timeout of that load balancer also needs to be checked. More information on that topic is described above.

 

Hope it helps!

Damith

 

 

Userlevel 7
Badge +19

@Minoshini Fonseka

Hello, I would like to confirm the detail cluster structure in this picture.

although this topic been posted for 1 year, I would very appreciate if you can give the answers!  

  1. the “Server 1” and “Server 2” are Oracle DB?
  2. between “Client” and “Azure Load Balancer”, is there only one middle ware server?

 

Thank you

Yingxin 


Hi @necyingxz,

Thank you for your reply. 

If I explain you on this diagram, this is a figure which explains the architecture up to the middleware server from the client level.

In general scenarios, a call would go in following way
Request : Client →  Middleware server → Database
                                                                                     |     
                                                                          Processing      
                                                                                     |
                  Client ← Middleware server ← Database   : Response

On occasions of load balancer is integrated, it’s between the client and the application server. Load balancer must be used in front of the cluster in order to distribute the load among the cluster nodes.  That is why the load balancer comes in the middle of the client and application servers, as explained in this diagram.




Further I would like to add IFS architectural diagram which will be useful to understand the complete overview. 



Answering your questions: 

  1. the “Server 1” and “Server 2” are Oracle DB?
    Answer : Server 1 and Server 2 are application server to which the load is distributed by the load balancer.
     
  2. between “Client” and “Azure Load Balancer”, is there only one middle ware server?
    Answer : Load balancer is between the client and the middleware server.

    I hope this clarifies your concerns. Thank you.
Userlevel 3
Badge +3

@Minoshini Fonseka

Hello, I would like to confirm the detail cluster structure in this picture.

although this topic been posted for 1 year, I would very appreciate if you can give the answers!  

  1. the “Server 1” and “Server 2” are Oracle DB?
  2. between “Client” and “Azure Load Balancer”, is there only one middle ware server?

 

Thank you

Yingxin 


Hi @necyingxz,

Thank you for your reply. 

If I explain you on this diagram, this is a figure which explains the architecture up to the middleware server from the client level.

In general scenarios, a call would go in following way
Request : Client →  Middleware server → Database
                                                                                     |     
                                                                          Processing      
                                                                                     |
                  Client ← Middleware server ← Database   : Response

On occasions of load balancer is integrated, it’s between the client and the application server. Load balancer must be used in front of the cluster in order to distribute the load among the cluster nodes.  That is why the load balancer comes in the middle of the client and application servers, as explained in this diagram.




Further I would like to add IFS architectural diagram which will be useful to understand the complete overview. 



Answering your questions: 

  1. the “Server 1” and “Server 2” are Oracle DB?
    Answer : Server 1 and Server 2 are application server to which the load is distributed by the load balancer.
     
  2. between “Client” and “Azure Load Balancer”, is there only one middle ware server?
    Answer : Load balancer is between the client and the middleware server.

    I hope this clarifies your concerns. Thank you.

Hi @Minoshini Fonseka  very appreciate for the explanations.

Actually, we got a similar issue that I have asked by other post:

Could you have a look at it and give me some advices?

Thank you!

Userlevel 5
Badge +13

Hi,

What happens when the process takes longer than 30 minutes?. I have the same problem with a customer that receives serials in a purchase order. With supportnet connection, it takes around 40 minutes, but customer is directly connected. So then, what’s the option here?

Regards,

Pilar

Userlevel 7
Badge +19

@Pilar Franco :
First, from the database level, you can verify approximately how much of time it takes to execute that flow.  If it is more than 30 minutes, either the performance of that flow should be improved product wise or it could be schedule to run as a background job if it is possible according to the application flow.  

Userlevel 5
Badge +13

Thanks MInoshini!

That’s indeed what we are currently working on. In customer’s environment it takes more than 30 minutes, so requested RnD to enable this process to be run in background too (Purchase receipt of 1500 serials). 

We are also analyzing why this process it’s taking so long in customer’s environment.

Regards,

Pilar

Reply