Question

IFS10 - CPU 100% and very slow system

  • 11 November 2019
  • 34 replies
  • 6171 views

Userlevel 4
Badge +9

After we upgraded to IFS10 we are struggling with slowness in our Oracle DB. We haven’t been able to pinpoint the cause of our problems. But very often (5-10 times each week), we see that the CPU goes to 100% and doesn’t let go. Every user in our company are then struggling with slowness when this happens. As an example, saving a time registration can take 3-4 minutes.

Today, a restart of the DB server was necessary to get it back to  normal.  

We have used the query below to find CPU consuming operations, and have also checked background jobs and DMBS jobs. Nothing special has been running when the problems occur, or at least nothing that helps us to conclude on a specific problem.

select s.username,
s.inst_id,
s.sid,
s.serial#,
s.status,
(select value from gv$sesstat natural join v$statname where name = 'physical reads' and sid=s.sid)disk_reads,
(select value from gv$sesstat natural join v$statname where name = 'session logical reads' and sid=s.sid) logical_reads,
(select value/100 cpu_sec from gv$sesstat natural join v$statname where name = 'CPU used by this session' and sid=s.sid)cpu_sec,
s.logon_time,
s.program,
s.osuser,
s.machine,
p.pid,
p.spid,
s.client_identifier,
s.client_info,
s.module,
s.action,
s.event,
s.wait_class,
case
when s.status = 'ACTIVE' then
s.last_call_et
else
null
end wait_sec,
s.blocking_session_status,
s.blocking_instance,
s.blocking_session,
s.sql_id,
s.sql_hash_value,
s.sql_address,
s.prev_sql_id,
s.prev_hash_value,
s.prev_sql_addr,
s.sql_child_number,
s.prev_child_number,
s.server
from gv$session s join gv$process p on (s.inst_id = p.inst_id and s.paddr = p.addr)
where s.type = 'USER' and s.status='ACTIVE'
order by disk_reads desc

We have integrations and scheduled tasks running, all which have been running fine in IFS 8.1 for years.

After googling, we some people says that Oracle 12.2 is not as effective as 11, but we can’t verify this.

Does anyone have some tips to solve these things, or at least find the root cause when the problems occur? Appreciate all help we can get. 

Some facts:

  • Around 5-700 concurrent users
  • Using Azure AD for authentication
  • Using around major 10 Modules in IFS (DocMan, Finance, Engineering, SCM, etc)
  • DB size around 600GB
  • Running reports with Crystal Reports (set up via web, running queries towards IFS)

 


34 replies

Userlevel 5
Badge +7

Try this one. I removed an inner select statement that displays the SQL_TEXT of the command being run. You can find the SQL commands being run by querying the v$sql VIEW and matching the SQL_ID being returned form the query shown below.

select sess_cpu.con_id, 
sess_cpu.sid, 
NVL(sess_cpu.username, 'Oracle Process') username, 
sess_cpu.status, 
sess_cpu.logon_time, 
sess_cpu.machine, sess_cpu.program, sess_cpu.type, sess_cpu.event,
round((sysdate - sess_cpu.logon_time)*1440*60) logon_SECS, 
sess_cpu.value/100 SESS_CPU_SECS, (sess_cpu.value - call_cpu.value)/100 LAST_CPU_SECS,
round((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, 
sess_cpu.sql_id
from
(select se.con_id, se.sql_id, ss.statistic#, se.sid, se.username, se.status, se.program, se.logon_time, sn.name, ss.value,
se.machine, se.type, se.event
from v$session se, v$sesstat ss, v$statname sn
where se.sid = ss.sid
and sn.statistic# = ss.statistic#
and sn.name in ('CPU used by this session') ) sess_cpu,
(select se.con_id, ss.statistic#, se.sid, ss.value, value/100 seconds 
from v$session se, v$sesstat ss, v$statname sn
where se.sid = ss.sid
and sn.statistic# = ss.statistic#
and sn.name in ('CPU used when call started') ) call_cpu
where sess_cpu.sid = call_cpu.sid
and sess_cpu.con_id = call_cpu.con_id
and round((sysdate - sess_cpu.logon_time)*1440*60) > 0
order by percent desc;
 

 

Userlevel 5
Badge +7

As an alternative to a $2,000 utility, here is a query that will show sessions ranked by CPU usage and shows helpful information about the session so you can track down the process/user. The query displays helpful information like username, machine, program, type, event, and the sql text/command being referenced.

 

select sess_cpu.con_id, 
sess_cpu.sid, 
NVL(sess_cpu.username, 'Oracle Process') username, 
sess_cpu.status, 
sess_cpu.logon_time, 
sess_cpu.machine, sess_cpu.program, sess_cpu.type, sess_cpu.event,
round((sysdate - sess_cpu.logon_time)*1440*60) logon_SECS, 
sess_cpu.value/100 SESS_CPU_SECS, (sess_cpu.value - call_cpu.value)/100 LAST_CPU_SECS,
round((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, 
sess_cpu.sql_id,
(select sql_text from v$sql sql where sql.sql_id = sess_cpu.sql_id) sql_text
from
(select se.con_id, se.sql_id, ss.statistic#, se.sid, se.username, se.status, se.program, se.logon_time, sn.name, ss.value,
se.machine, se.type, se.event
from v$session se, v$sesstat ss, v$statname sn
where se.sid = ss.sid
and sn.statistic# = ss.statistic#
and sn.name in ('CPU used by this session') ) sess_cpu,
(select se.con_id, ss.statistic#, se.sid, ss.value, value/100 seconds 
from v$session se, v$sesstat ss, v$statname sn
where se.sid = ss.sid
and sn.statistic# = ss.statistic#
and sn.name in ('CPU used when call started') ) call_cpu
where sess_cpu.sid = call_cpu.sid
and sess_cpu.con_id = call_cpu.con_id
and round((sysdate - sess_cpu.logon_time)*1440*60) > 0
order by percent desc;

 

 

Userlevel 6
Badge +9

Presuming most of the CPU is being consumed by the Oracle service,

Have you checked the memory usage of the server whilst CPU hits 100%? also does CPU reach 100% only when certain batch jobs execute? (you can check this by querying the executing background jobs when CPU hits 100%). This should shed some light on the issue.


If there are no background jobs running and CPU hits 100% and there is no significant errors found in the Oracle alert log, I believe an Oracle Stats pack report or AWR report has to be analyzed in order for any issues.

Userlevel 5
Badge +7

Pinpointing these issues may take several queries and a bit of monitoring to pin down. Can you please provide the following information.

  1. The server has 96 GB of memory installed. How much memory is being used by the “oracle.exe” process(es)? Please confirm that this server only hosts one database.
  2. What are the values of the following Oracle parameters?
    • compatible
    • cpu_count
    • db_cache_size
    • job_queue_processes
    • optimizer_deatures_enable
    • optimizer_index_cost_adj
    • pga_aggregate_target
    • processes
    • sessions
    • sga_max_size
    • sga_target
    • shared_pool_size
    • sort_area_size
  3. How many cores are assigned to this server?
  4. Is the server physical or virtual?
  5. Are the table statistics current (withing the past week)?
  6. Have the indexes been validated lately?
  7. What is the operating system version?
  8. Do you have a utility installed to view the threads under the “oracle.exe” process to see what type of process (worker, execution, job, etc) is consuming the CPU? All sessions run as a separate thread under the “oracle.exe” process.
  9. If the issue went away when the database was restarted it was either a background job (IFS or database) or an interactive process launched by a user. Please comment.
  10. What is your schedule for updating the IFS search indexes on objects? Do you have all of them enabled?
  11. When the excessive CPU was found was the database creating large numbers of archive logs? This will let us know if the process was performing any inserts, updates, or deletions as part of the processing.
  12. Do you have the in-memory option turned on?
Userlevel 6
Badge +18

Hello,

It is interesting to note that your Apps 10 database is facing performance issues, especially CPU spiking at 100%. In our experience, Apps 10 database is designed to perform fast.

  1. Please check to see if there is any Oracle maintenance jobs running as per the Oracle alert log around the same time when the CPU spikes up. We had a client who ran into a similar issue running Oracle 12.1.0.2 and it turned out to be a maintenance job. The issue went away once the job was disabled.  
  2. Check for blocking locks in the database using the query below. This also causes high CPU usage.
  3. ------- REM IFS SQL to find blocking locks ----------------
    select logon_time, sid, serial#, blocking_session, program, seconds_in_wait, machine, client_info, username "ORACLE USERNAME", wait_class, audsid, process
    from  v$session 
    where blocking_session is not NULL 
    order by logon_time;

Please check these two first and keep me posted.

Userlevel 7
Badge +18

Intelligent tuning is only possible with good measurements. If you’re not comfortable with STATSPACK (included with every version of Oracle) or AWR (included if you’ve separately licensed the Oracle Tuning Pack on top of Enterprise Edition), I highly recommend Database Performance Analyzer.

https://www.solarwinds.com/database-performance-analyzer

Userlevel 7
Badge +18

@alanbclark , this is definitely going in my toolkit. 

I’d like to suggest a few edits:

∙ You can usually look at the CLIENT_INFO to get username information for IFSSYS sessions.
∙ As I said, V$SQL may include multiple copies of a given query for things like index changes or for a different parsing user. I added the child number to make a unique join.
∙ It might be helpful to separate the session CPU from the call CPU. Otherwise, a session that starts out idle and then later does a lot of work might get underreported.
∙ I joined to active background jobs.
 

 

Userlevel 7
Badge +21

Hi,

Is this a Multi-Tenant database or a standalone database? Are you running the base version of Oracle 12.2. or have you applied the later patch releases? There were significant performance issues related to Multi-tenant databases in the base release which were fixed in later PSU’s. 

Userlevel 2
Badge +8

I know I am late to the party here and we are on IFS9  not 10 .   But we upgraded from standard edition of oracle to Enterprise addition   Created all new servers for IFS boxes.   We are seeing performance issues also in oracle.   We figured that more cpu, more ram would make it faster on enterprise than it had been on Standard Edition.   What we are finding now is issues in Crystal Reports.  it appears if we convert the operational reports from not linking to addition views and instead going in and changing the report to use a single IAL to access the data then the report is running better.     Hoping for a magical tweak to Crystal that would get us back to where we were at on Standard Edition of Oracle.since we dont want to modify every one of our crystal reports in existence.    This process is also causing slowdowns and problems for other transactions in IFS Core as well.      Was wondering what you all did to solve your issues 

Thanks 
Allan 

Userlevel 7
Badge +15

From a Crystal Reports framework perspective make sure you have installed the new version of the Crystal Web Service and the Crystal .net runtime that comes with Apps 10.

Also you can enable the CWS logs and see how much time it takes to process and generate the layout.

Userlevel 4
Badge +9

Just wanted to give an update on this. 

We did three changes and haven’t seen the CPU spike to 100% after this:

  • Changed Parallel Execution from 640 to 0 (alter system set parallel_max_servers=0 scope=both;)
  • Changed optimizer_features_enable to 11.2.0.4
  • Changed disks handling for swapfile, changed to SSD and size according to RAM

We are still struggling with slower query results than in IFS8, much more waiting for our users. If anyone have tips on how what to check, any help will be appreciated! Especially the HR checks (User_access_api) and Language/Translation checks (DOMAIN_SYS) seems to take much longer time in IFS10. We are only using one language in our IFS version. Could maybe be an configuration error in our system?

Badge +1

If your Database server is swapping, you really need to add more memory.

 

Another reason for poor performance can be backup jobs, either RMAN or Server backup

Userlevel 6
Badge +18

You mentioned early in the thread that the index rebuilds hadn’t happened since the upgrade, which may explain your slow searching that you mentioned was an ongoing issue.

If you haven’t already, you should analyze and if needed rebuild them.

Nick

Userlevel 4
Badge +9

@terje , we have 96GB of RAM and the SSD swap drive is set to 200GB. Would think this is enough?

 

We'll check our RMAN and Server backup. 

Badge +1

SSD swap area is a safety net, that only should be used in rare situations.

If you OS or DB is swapping, during daily operations, your performance will be downgraded.

81,6 GB SGA should be enough, if the db parameters are optimized.

If you have swapping on OS level, you need give more memory to the OS, reducing the DB SGA.

Userlevel 4
Badge +9

@NadeeshHerath , correct. All CPU is consumed by Oracle Service. The memory usage (tot 96GB) is constantly at around 90%, I think this is normal behavior. 

We checked the background jobs when the CPU hits 100% Nothing time consuming or big things was executing, so no findings there. 

 

We discovered that the SWAP drive was not configured according to increased RAM, and setup as SSD. This has been fixed now, waiting to check if this has an effect. 

 

We have also played with the parameter OPTIMIZER_FEATURES_ENABLE, setting this to 11.2.0.4. And turned off parallel execution. Not 100% if these settings has any effect yet. Does anyone have experience with this?

 

We haven’t checked the Oracle Alert log, but I would assume we would’ve heard from our DB partner company if anything is wrong there.  Can check with them. I also thing we should look at the Oracle Stats pack or AWR reports. 

Userlevel 5
Badge +7

It would be interesting to know what and how many objects are available for parallel execution. Most IFS objects have a parallel processing degree of 1, and therefore won’t start up parallel processes.

 

SELECT owner, table_name, degree
FROM dba_tables
where trim(degree) <> '1'
order by owner, table_name;
 

Here’s q query where you can view parallel statistics should it still be enabled on other databases. Needs privilege to run this query.

select * from v$pq_sysstat;

 

Userlevel 1
Badge +2

If you look in your top SQL you might see:

select toid from type$ where…

It is related to metadata loading in Aurena and consumes quite some resources during metadata load. This is due to a bug in Oracle which can be resolved by applying a patch. Basically the patch solves a missing sys index.

 

Search Bug 25154887 : RECURSIVE QUERIES ON TYPE$ RUN DURING DESC SLOW IN 12C in Oracle Metalink.

 

//Fredrik

Userlevel 4
Badge +9

@alanbclark, the first query return 0 records. Does this mean that disabling parallel execution is not the reason why it’s better now?

 

@terje , our Oracle DB partner doubt that RMAN is the cause of slowness. Daily incremental backup at 19:00, runs for 20min, log backup each half hour, but this is finished in 1 minute. Full backup runs for a while, but this is only running in weekends. 

We’ll check the swapping settings. 

Userlevel 4
Badge +9

@alanbclark , thanks for looking at this

  1. 81.6 GB. Yes, it only serves one database
    • compatible: 12.2.0
    • cpu_count: 16
    • db_cache_size: 40G
    • job_queue_processes: 100
    • optimizer_features_enable: 11.2.0.4
    • optimizer_index_cost_adj: 10
    • pga_aggregate_target: 2000M
    • processes: 800
    • sessions: 1232
    • sga_max_size: 80G
    • sga_target: 80G
    • shared_pool_size: 4G
    • sort_area_size=65536
  2. 4 sockets / 16 vCPU
  3. Virtual
  4. Yes
  5. Yes
  6. Windows Server 2016 (1607)
  7. Yes
  8. The issue went away after the restart, and we didn’t notice any particular time consuming jobs or processes running before the restart. 
  9. Rebuild indexes are scheduled to run every 2. month. It doesn’t seems to have run after the upgrade.
  10. No
  11. Not in use
Userlevel 4
Badge +9

As an alternative to a $2,000 utility, here is a query that will show sessions ranked by CPU usage and shows helpful information about the session so you can track down the process/user. The query displays helpful information like username, machine, program, type, event, and the sql text/command being referenced.

 

select sess_cpu.con_id, 
sess_cpu.sid, 
NVL(sess_cpu.username, 'Oracle Process') username, 
sess_cpu.status, 
sess_cpu.logon_time, 
sess_cpu.machine, sess_cpu.program, sess_cpu.type, sess_cpu.event,
round((sysdate - sess_cpu.logon_time)*1440*60) logon_SECS, 
sess_cpu.value/100 SESS_CPU_SECS, (sess_cpu.value - call_cpu.value)/100 LAST_CPU_SECS,
round((sess_cpu.value/100)/round((sysdate - sess_cpu.logon_time)*1440*60)*100,2) Percent, 
sess_cpu.sql_id,
(select sql_text from v$sql sql where sql.sql_id = sess_cpu.sql_id) sql_text
from
(select se.con_id, se.sql_id, ss.statistic#, se.sid, se.username, se.status, se.program, se.logon_time, sn.name, ss.value,
se.machine, se.type, se.event
from v$session se, v$sesstat ss, v$statname sn
where se.sid = ss.sid
and sn.statistic# = ss.statistic#
and sn.name in ('CPU used by this session') ) sess_cpu,
(select se.con_id, ss.statistic#, se.sid, ss.value, value/100 seconds 
from v$session se, v$sesstat ss, v$statname sn
where se.sid = ss.sid
and sn.statistic# = ss.statistic#
and sn.name in ('CPU used when call started') ) call_cpu
where sess_cpu.sid = call_cpu.sid
and sess_cpu.con_id = call_cpu.con_id
and round((sysdate - sess_cpu.logon_time)*1440*60) > 0
order by percent desc;

 

 

I’m getting an error running this

 

Userlevel 4
Badge +9

Thanks @alanbclark . Is there any information to get from showing the INACTIVE sessions? I see that the query I posted in my first post shows almost the same information, but only showing the ACTIVE sessions. My also shows the SQL running. 

Userlevel 7
Badge +18

You'll need to also join by the SQL child ID. If it's null on the session, you'll want 0 on the library cache. This happens because the same query can be executed with multiple plans depending on table statistics, available indexes, and the parsing schema.

Userlevel 1
Badge +2

Hi,

Could you if possible get a statpack report for that system during the time period when the problem occur. 

 

Could you also describe the typical user behavior during this time frame?

 

Send the statpack report to me for further investigations in IFS Benchmarking lab.


 

Badge +1

Hi,

i hope your issues has been resolved. We went through something similar when upgrading from Oracle 11 → 12. 

Looking from your pfile you have optimizer_features_enable: 11.2.0.4 which did actually give some boost to performance.

However after months of searching and comparing SQL executionplans we ended up with:

optimizer_adaptive_features=FALSE
optimizer_dynamic_sampling=2

The adaptive feature did not work for us at all so we disabled it. We have been running with these settings for a couple of years now and works ok. Big jobs like MRP now runs at same performance as with Oracle 11.

Reply