Question

IFS10 - CPU 100% and very slow system

  • 11 November 2019
  • 34 replies
  • 6222 views

Userlevel 4
Badge +9

After we upgraded to IFS10 we are struggling with slowness in our Oracle DB. We haven’t been able to pinpoint the cause of our problems. But very often (5-10 times each week), we see that the CPU goes to 100% and doesn’t let go. Every user in our company are then struggling with slowness when this happens. As an example, saving a time registration can take 3-4 minutes.

Today, a restart of the DB server was necessary to get it back to  normal.  

We have used the query below to find CPU consuming operations, and have also checked background jobs and DMBS jobs. Nothing special has been running when the problems occur, or at least nothing that helps us to conclude on a specific problem.

select s.username,
s.inst_id,
s.sid,
s.serial#,
s.status,
(select value from gv$sesstat natural join v$statname where name = 'physical reads' and sid=s.sid)disk_reads,
(select value from gv$sesstat natural join v$statname where name = 'session logical reads' and sid=s.sid) logical_reads,
(select value/100 cpu_sec from gv$sesstat natural join v$statname where name = 'CPU used by this session' and sid=s.sid)cpu_sec,
s.logon_time,
s.program,
s.osuser,
s.machine,
p.pid,
p.spid,
s.client_identifier,
s.client_info,
s.module,
s.action,
s.event,
s.wait_class,
case
when s.status = 'ACTIVE' then
s.last_call_et
else
null
end wait_sec,
s.blocking_session_status,
s.blocking_instance,
s.blocking_session,
s.sql_id,
s.sql_hash_value,
s.sql_address,
s.prev_sql_id,
s.prev_hash_value,
s.prev_sql_addr,
s.sql_child_number,
s.prev_child_number,
s.server
from gv$session s join gv$process p on (s.inst_id = p.inst_id and s.paddr = p.addr)
where s.type = 'USER' and s.status='ACTIVE'
order by disk_reads desc

We have integrations and scheduled tasks running, all which have been running fine in IFS 8.1 for years.

After googling, we some people says that Oracle 12.2 is not as effective as 11, but we can’t verify this.

Does anyone have some tips to solve these things, or at least find the root cause when the problems occur? Appreciate all help we can get. 

Some facts:

  • Around 5-700 concurrent users
  • Using Azure AD for authentication
  • Using around major 10 Modules in IFS (DocMan, Finance, Engineering, SCM, etc)
  • DB size around 600GB
  • Running reports with Crystal Reports (set up via web, running queries towards IFS)

 


34 replies

Userlevel 5
Badge +9

We have issues with our test environment DB running slow too. We determined a couple factors:

  • Prod is running on higher quality hardware in our Data Centre
  • Test is running on-prem on still good hardware but not as high caliber
  • There is a minor oracle version difference between the oracle versions between our prod and test environments. There is a bug in the minor oracle version Our test environment uses that causes the scheduled jobs to run very slowly which slows the whole system down. 

We were told the solution was to upgrade our oracle version which we haven’t done yet as we don’t have an in-house oracle DBA and consultants can be expensive (especially for a test environment).

Userlevel 4
Badge +9

I know I am late to the party here and we are on IFS9  not 10 .   But we upgraded from standard edition of oracle to Enterprise addition   Created all new servers for IFS boxes.   We are seeing performance issues also in oracle.   We figured that more cpu, more ram would make it faster on enterprise than it had been on Standard Edition.   What we are finding now is issues in Crystal Reports.  it appears if we convert the operational reports from not linking to addition views and instead going in and changing the report to use a single IAL to access the data then the report is running better.     Hoping for a magical tweak to Crystal that would get us back to where we were at on Standard Edition of Oracle.since we dont want to modify every one of our crystal reports in existence.    This process is also causing slowdowns and problems for other transactions in IFS Core as well.      Was wondering what you all did to solve your issues 

Thanks 
Allan 

Hi Allan, 

We have just been in a process of changing our IT Operation provider. With them, new servers with top CPU and ram was delegated to our IFS environment and after that speed has been much better. Still some things not working 100% percent and we need to rerun som DB indexes from time to time (could maybe be done other ways that we are not aware of). But compared to when I created this post, it’s night and day! We are running on Oracle Standard edition. 

 

We are also running SAP Crystal Reports, but through a web page and not through the built-in IFS CR. 

 

 

Userlevel 2
Badge +8

I know I am late to the party here and we are on IFS9  not 10 .   But we upgraded from standard edition of oracle to Enterprise addition   Created all new servers for IFS boxes.   We are seeing performance issues also in oracle.   We figured that more cpu, more ram would make it faster on enterprise than it had been on Standard Edition.   What we are finding now is issues in Crystal Reports.  it appears if we convert the operational reports from not linking to addition views and instead going in and changing the report to use a single IAL to access the data then the report is running better.     Hoping for a magical tweak to Crystal that would get us back to where we were at on Standard Edition of Oracle.since we dont want to modify every one of our crystal reports in existence.    This process is also causing slowdowns and problems for other transactions in IFS Core as well.      Was wondering what you all did to solve your issues 

Thanks 
Allan 

Badge +1

Hi,

i hope your issues has been resolved. We went through something similar when upgrading from Oracle 11 → 12. 

Looking from your pfile you have optimizer_features_enable: 11.2.0.4 which did actually give some boost to performance.

However after months of searching and comparing SQL executionplans we ended up with:

optimizer_adaptive_features=FALSE
optimizer_dynamic_sampling=2

The adaptive feature did not work for us at all so we disabled it. We have been running with these settings for a couple of years now and works ok. Big jobs like MRP now runs at same performance as with Oracle 11.

Userlevel 4
Badge +9

You mentioned early in the thread that the index rebuilds hadn’t happened since the upgrade, which may explain your slow searching that you mentioned was an ongoing issue.

If you haven’t already, you should analyze and if needed rebuild them.

Nick

Sorry for the late response on this. We validated and ran the indexes right after tips here, but this didn’t have any effect. We have also included a few new indexes without any big effect.

In general we are still struggling with much slower IFS10 then IFS8, and our users are getting a little Impatient. 

I fear that some slowness is coming directly from the upgrade from IFS8.1. That our basic data needs to be resigned (should have been informed by the IFS consultants in our projects?).

Does anyone have experience with certain modules in IFS that needs special attention in IFS10 compared to IFS8 and tips to be applied? For example we see that the Employee Access functions in IFS business logic are causing much of the slowness (Time Authentication, Employee Results, etc).

But also loading of certain projects in Project Navigator and searching for project transactions are much slower. Often ending in timeout (>30min) compared to running in 8-10minutes in IFS8.1.

Userlevel 6
Badge +18

You mentioned early in the thread that the index rebuilds hadn’t happened since the upgrade, which may explain your slow searching that you mentioned was an ongoing issue.

If you haven’t already, you should analyze and if needed rebuild them.

Nick

Badge +3

I now recently started with instructing some endusers to have the debug console active .. clear from time to time .. hoping that from the exports will come something usable.  For the startup slowness, apparently that is acknoledged by IFS support, hopefully they can make their point with R&D.

 

 

Userlevel 4
Badge +9

@filiep.vw We haven’t seen so much of the 100% CPU lately, but slower IFS10 than v.8.1 is still an ongoing issue for us. We’ve found that changing the Oracle paramenter to 11.2.0.4 seems to have most effect, but this is not something we want permanent. 

Two Oracle experts have been looking at this, and we have installed Oracle Statspack to identify problems. Some of the problems were due to own customizations not working properly after the upgrade, but some are also due to IFS functionality that works slower in IFS10 than in IFS8.1.

Our next step now will be to create IFS support tickets for each slow area we find.

I would very much like IFS to acknowledge this issue and make a general statement which areas are effected and what can be done.  

Badge +3

Did you get any further with this? 

We have the same issues with our database server (Oracle Enterprise 12g) and nobdy seems to bother about this. 

Should IFS not have the experience to pinpoint the issue? This is clearly not an isolated problem.

Userlevel 4
Badge +9

@alanbclark, the first query return 0 records. Does this mean that disabling parallel execution is not the reason why it’s better now?

 

@terje , our Oracle DB partner doubt that RMAN is the cause of slowness. Daily incremental backup at 19:00, runs for 20min, log backup each half hour, but this is finished in 1 minute. Full backup runs for a while, but this is only running in weekends. 

We’ll check the swapping settings. 

Userlevel 1
Badge +2

If you look in your top SQL you might see:

select toid from type$ where…

It is related to metadata loading in Aurena and consumes quite some resources during metadata load. This is due to a bug in Oracle which can be resolved by applying a patch. Basically the patch solves a missing sys index.

 

Search Bug 25154887 : RECURSIVE QUERIES ON TYPE$ RUN DURING DESC SLOW IN 12C in Oracle Metalink.

 

//Fredrik

Userlevel 5
Badge +7

It would be interesting to know what and how many objects are available for parallel execution. Most IFS objects have a parallel processing degree of 1, and therefore won’t start up parallel processes.

 

SELECT owner, table_name, degree
FROM dba_tables
where trim(degree) <> '1'
order by owner, table_name;
 

Here’s q query where you can view parallel statistics should it still be enabled on other databases. Needs privilege to run this query.

select * from v$pq_sysstat;

 

Badge +1

SSD swap area is a safety net, that only should be used in rare situations.

If you OS or DB is swapping, during daily operations, your performance will be downgraded.

81,6 GB SGA should be enough, if the db parameters are optimized.

If you have swapping on OS level, you need give more memory to the OS, reducing the DB SGA.

Userlevel 4
Badge +9

@terje , we have 96GB of RAM and the SSD swap drive is set to 200GB. Would think this is enough?

 

We'll check our RMAN and Server backup. 

Badge +1

If your Database server is swapping, you really need to add more memory.

 

Another reason for poor performance can be backup jobs, either RMAN or Server backup

Userlevel 4
Badge +9

Just wanted to give an update on this. 

We did three changes and haven’t seen the CPU spike to 100% after this:

  • Changed Parallel Execution from 640 to 0 (alter system set parallel_max_servers=0 scope=both;)
  • Changed optimizer_features_enable to 11.2.0.4
  • Changed disks handling for swapfile, changed to SSD and size according to RAM

We are still struggling with slower query results than in IFS8, much more waiting for our users. If anyone have tips on how what to check, any help will be appreciated! Especially the HR checks (User_access_api) and Language/Translation checks (DOMAIN_SYS) seems to take much longer time in IFS10. We are only using one language in our IFS version. Could maybe be an configuration error in our system?

Userlevel 7
Badge +15

From a Crystal Reports framework perspective make sure you have installed the new version of the Crystal Web Service and the Crystal .net runtime that comes with Apps 10.

Also you can enable the CWS logs and see how much time it takes to process and generate the layout.

Userlevel 4
Badge +9

Hi,

Is this a Multi-Tenant database or a standalone database? Are you running the base version of Oracle 12.2. or have you applied the later patch releases? There were significant performance issues related to Multi-tenant databases in the base release which were fixed in later PSU’s. 

It’s a standalone database. We are running latest PSU’s

Userlevel 7
Badge +18

I reported the average session CPU separately from the average call CPU. Most sessions will tend to be currently idle.

Userlevel 5
Badge +10

Thanks for the SQL; this is really interesting and something i will use for analysis on any performance issues.

I notice a difference between @durette  and @alanbclark  SQL scripts in that the percentage columns return very different results and this is the field used for ordering the highest consuming job. @alanbclark percentage column seems to have values for most lines and the value seems sensible, whereas @durette only a small number have a value returned in percentage and the rest are empty.

Are you able to explain why different?

Which of the two SQL statements should be more accurate to use?

Regards

Shaun

Userlevel 7
Badge +18

@alanbclark , this is definitely going in my toolkit. 

I’d like to suggest a few edits:

∙ You can usually look at the CLIENT_INFO to get username information for IFSSYS sessions.
∙ As I said, V$SQL may include multiple copies of a given query for things like index changes or for a different parsing user. I added the child number to make a unique join.
∙ It might be helpful to separate the session CPU from the call CPU. Otherwise, a session that starts out idle and then later does a lot of work might get underreported.
∙ I joined to active background jobs.
 

 

Userlevel 7
Badge +21

Hi,

Is this a Multi-Tenant database or a standalone database? Are you running the base version of Oracle 12.2. or have you applied the later patch releases? There were significant performance issues related to Multi-tenant databases in the base release which were fixed in later PSU’s. 

Userlevel 1
Badge +2

Hi,

Could you if possible get a statpack report for that system during the time period when the problem occur. 

 

Could you also describe the typical user behavior during this time frame?

 

Send the statpack report to me for further investigations in IFS Benchmarking lab.


 

Userlevel 7
Badge +18

You'll need to also join by the SQL child ID. If it's null on the session, you'll want 0 on the library cache. This happens because the same query can be executed with multiple plans depending on table statistics, available indexes, and the parsing schema.

Userlevel 4
Badge +9

Thanks @alanbclark . Is there any information to get from showing the INACTIVE sessions? I see that the query I posted in my first post shows almost the same information, but only showing the ACTIVE sessions. My also shows the SQL running. 

Reply