Solved

Apps9: How often does IFS go down/offline?

  • 4 August 2021
  • 7 replies
  • 271 views

Userlevel 6
Badge +11

We are in the process of upgrading our servers, and have experienced some unknowns that we couldnt cover in testing that have caused the IFS live application to be offline, sometimes for up to 3 hours when we are having to restart servers and understand what has caused the interuption.

The business are understanding but expect everything to be up 24/7, 365 days.

Am trying to manage expectation, but wondering from others experiece. How often does IFS experience service interuptions? We are averaging 98% uptime overall.

icon

Best answer by NickPorter 4 August 2021, 14:47

View original

This topic has been closed for comments

7 replies

Userlevel 6
Badge +18

It will depend a lot on what you’re doing with the system (e.g. steady-state vs actively adding modules or expanding use of the system), and how you have it configured (e.g. Oracle RAC or clustered application servers can help with uptime in the event of some issues).

IFS itself is pretty stable (if we exclude planned maintenance), but Apps9 seems to want a reboot every couple of months... it feels like there’s a memory leak in there somewhere. 

That said, with all things included, we’re currently getting some kind of small service issue every month between the internet going down, Windows server lockups, database failovers meaning we need to reboot the Apps Server to reconnect WADACO.  Nothing critical caused by IFS recently though.

Nick

Userlevel 6
Badge +11

We have had to upgrade our Oracle servers, and version to 19c which is the recent big change which has caused a few interuptions. We are runing on-prem going to a clustered state and the failsafe manager sometimes senses the listner has an interuption which switches to the 2nd node. That itself causes a small 10min interuptions, and then some of the interfaces we have arent always running, the print server needs a restart etc.

Similar to you, we have issues with Windows server lockups (Often due to recent windows updates!) with apps server reboots needed.

Like you, nothing critical, but none the less it causes the front-end to be inacessable while we reboot and bring everything back up.

Maybe its somethign we will have to live with, there are a lot of components and as much as you test there is always somethign that slips through.

My challenge is managing the expectation of the business. We have suggested improvements to improve stability, but ultimately when the state is steady, then we see no interupptions.

Trying to play catchup with overdue upgrades is the main cause am finding

Userlevel 6
Badge +18

As a side note, if your business is truly expecting 24x7x365 uptime for this kind of solution, with no outages in any area at all, that is pretty unrealistic and will be very complex and expensive.  You would need redundant everything to allow for server work (e.g. Windows server patching that requires reboots).  We have monthly standing service windows to allow us to regularly work on some items, but even then there are some things that we have to time to be done  during shift changes so as to minimize impact to our 24x7 sites.

Userlevel 6
Badge +12

We struggle with IFS uptime, but only with the middleware services (MWS). We are on Apps 10.

Our main issues occur when restarting the Windows servers for monthly maintenance: node managers and the HttpServer module sometimes to not start correctly, and several times now (across multiple servers and instances), the HttpServer module gets stuck in a SHUTDOWN state.

When this happens we follow some troubleshooting steps involving mws-svr.cmd and looking for rogue Java processes. Through a mix of restarts, killing Java processes, and rebooting, we can usually get IFS back up and running. But it is rather disappointing. In some scenarios it take a long time to get the system back up, and we are no closer to understanding what has happened once service is restored. In one case, the consulting partners we are working with had to go so far as to reinstall the entire MWS layer. That made no sense to me, but they did it anyway. We have had situations where doing a full reconfiguration of MWS to reset it did not solve the HttpServer SHUTDOWN issue. That is the main problem I’d love to get some help on.

The Oracle database servers have been rock solid, though, and client software (we use both IEE and Aurena) work pretty well. It is just the middleware servers that give us problems about once a month. But we need to do monthly maintenance to keep up with security patches. etc., so we are sort of stuck between a rock and a hard place. Our consulting partner has not been of much help in giving us a standard checklist of shutdown and startup procedures, mainly because the downtime issues we have had seem to me slightly different every time and no one is able to tell us what exactly causes the issues in the first place.

Sorry, this probably isn’t much help, but at least you know other folks struggle with uptime as well. Are you seeing issues mainly in the database, middleware, or client layer?

 

Thanks,

Joe Kaufman

Userlevel 6
Badge +11

As a side note, if your business is truly expecting 24x7x365 uptime for this kind of solution, with no outages in any area at all, that is pretty unrealistic and will be very complex and expensive.  You would need redundant everything to allow for server work (e.g. Windows server patching that requires reboots).  We have monthly standing service windows to allow us to regularly work on some items, but even then there are some things that we have to time to be done  during shift changes so as to minimize impact to our 24x7 sites.

 

Yup, I am trying to explain that to the business. Unfortunately its an uphill battle, they see these things as a cost issue rather than integrity.

Thanks for that input though, its good to confirm others agree that 100% uptime (Especially with our configuration) is not realistic. (We dont need it to be quite 24/7 - 365, but the opportunity windows are very small).

Userlevel 6
Badge +11

We struggle with IFS uptime, but only with the middleware services (MWS). We are on Apps 10.

Our main issues occur when restarting the Windows servers for monthly maintenance: node managers and the HttpServer module sometimes to not start correctly, and several times now (across multiple servers and instances), the HttpServer module gets stuck in a SHUTDOWN state.

When this happens we follow some troubleshooting steps involving mws-svr.cmd and looking for rogue Java processes. Through a mix of restarts, killing Java processes, and rebooting, we can usually get IFS back up and running. But it is rather disappointing. In some scenarios it take a long time to get the system back up, and we are no closer to understanding what has happened once service is restored. In one case, the consulting partners we are working with had to go so far as to reinstall the entire MWS layer. That made no sense to me, but they did it anyway. We have had situations where doing a full reconfiguration of MWS to reset it did not solve the HttpServer SHUTDOWN issue. That is the main problem I’d love to get some help on.

The Oracle database servers have been rock solid, though, and client software (we use both IEE and Aurena) work pretty well. It is just the middleware servers that give us problems about once a month. But we need to do monthly maintenance to keep up with security patches. etc., so we are sort of stuck between a rock and a hard place. Our consulting partner has not been of much help in giving us a standard checklist of shutdown and startup procedures, mainly because the downtime issues we have had seem to me slightly different every time and no one is able to tell us what exactly causes the issues in the first place.

Sorry, this probably isn’t much help, but at least you know other folks struggle with uptime as well. Are you seeing issues mainly in the database, middleware, or client layer?

 

Thanks,

Joe Kaufman


Not at all, its very helpful. Same with us, its mostly MWS that cause issues. Okay we have had some exceptions with the recent Oracle server upgrade moving to clustered nodes, but in the past it was always a rogue Java process, or a build up of stuck threads that required a restart to clear. Each time, similar to you, we reach out to consultants and IFS with little to no success.

Business expect black and white answers to a very much shades of grey issue, that requires much trouble shooting and time to get the root cause.

Had similar on other systems have used, so i know its not limited to IFS.

Userlevel 6
Badge +12

We are coming from file-server-based solutions (custom apps written in Visual Foxpro), so the downtime of “world class” software has been, as I said, disappointing. We actually have better uptime with old-school DBF-based database applications, because all they need is the file servers to be up and a strong LAN (which we have).

Management seems to understand what we are up against, though, so that is good. I just wish someone could give us reliable intel on what is so relatively fragile about the middleware layer. as far as I know, it’s basically a modified Oracle toolset. How Oracle can make such a robust DBMS but a comparatively feeble MWS layer I’ll never understand.

 

Thanks,

Joe Kaufman