Common reason why you get server failed_not_restartable after a VM restart

  • 20 April 2021
  • 5 replies
  • 595 views

Userlevel 6
Badge +9

Common environment issue I have noticed and this is commonly seen in internal environments due to the VM shutting down performed during weekends.

 

Error: An error related to managed server will be thrown after reconfiguring/restarting and the ManagedServer could even go to ‘failed_not_restartable' State.

 

Resolution:  

  • Navigate to the following config file - C:\IFS\DEV\wls_domain\DEV8\config\config.xml
  • Search for ‘listen-address'
  • Instead of the VM name an IP address will be shown for the managed server. This has to be changed back to the VM name.

This can be done to resolve Admin Server failures too.

 

I believe this should reduce a lot of time spent on investigating logs and recreating instances.

The above is in addition to checking the lok files in wls_domain.


5 replies

Userlevel 6
Badge +11

Sometimes we need to delete the persistent store files as well (Apart from .lok files) to overcome the Failed_Not_Restartable error. The following are the locations of each server. (APPS10)

  1. Admin Server

*.dat files under

wls_domain\<INSTANCE_NAME>\servers\AdminServer\data\store\default

wls_domain\<INSTANCE_NAME>\servers\AdminServer\data\store\diagnostics

  1. Main Server

*.dat files under

wls_domain\<INSTANCE_NAME>\servers\MainServer1\data\store\default

 wls_domain\<INSTANCE_NAME>\servers\MainServer1\data\store\diagnostics

  1. Int Server

*.dat files under

wls_domain\<INSTANCE_NAME>\servers\IntServer\data\store\default

wls_domain\<INSTANCE_NAME>\servers\IntServer\data\store\diagnostic

Userlevel 6
Badge +13

I remember this being discussed during IFS Middleware server administration training. When host server is re-started without shutting down the IFS application server instance, that would be an 

abrupt shutdown. Hence most of the time the state files that maintain the server status are known to

register this event as a failure and when the host sever comes back online, the state files will register failed_not_restartable as the status of the servers. ( The node managers will try to bring up the servers after the host server re-boot if the node manager windows services are set to start automatically) 

To avoid this, the best way would be to perform a “Graceful Shutdown” of the application server. 

That is doing an orderly shutdown of IFS application before performing a host server re-start. This way, the IFS application server state files will register the correct states and will not try to bring up the servers after the host server re-boot. Once the re-start is performed, the IFS application server can be started manually. 

 

 

Userlevel 7
Badge +12

In addition to what others have mentioned, I remember seeing this problem on APP8 environments due to this listen address reason. As a solution from APP9 onwards, now we do not have to specify a listen address when configuring managed servers - you can actually leave the field empty during MWS configuration. Having an empty listen address in the Managed Server will make it listen on all available IP addresses. 

But still, from time to time you would get this error even in APP9 and 10 and as @KasunBalasooriya has mentioned, the best way to avoid FAILED_NOT_RESTARTABLE error is to ensure graceful shutdowns and startups. In the event of an ungraceful shutdown, some lock files are left behind in following locations:

\\<IFS_HOME>\wls_domain\<InstanceID>\servers\<ServerName>\data\ldap\ldapfiles\EmbeddedLDAP.lok


\\<IFS_HOME>\wls_domain\<InstanceID>\servers\<ServerName>\ManagedServer1.lok

These lock files should be present only when the relevant server is running and if you are seeing them while getting FAILED_NOT_RESTARTABLE error, you can delete these 2 files and try to bring up the server again.

Userlevel 5
Badge +8

As @Imal Thiunuwan mentioned, common issue of getting FAILED_NOT_RESTARTABLE issue on APP 10 track will be resolved when delete the persistent store files. 

The common reason for this sort of issue is sudden restart or shutdown in application server machine or VM. Since the middleware application (either Main Server or Admin Server) have not stopped gracefully (Using administrate script or Windows services) that can result corrupted data, often the persistent data store where middleware application stores configuration/data for high-speed access.

Resolution for these issuse can be done by below two ways 
 

1.        Restore the IFSHOME files from recent file/VM backup and restart the services.

              This is the recommended method as we don’t know what other sub systems are affected. (ex: Main Server, Admin Server, Int Server, etc )

2.        For the immediate recovery, you can try removing the corrupted persistent data file and restart relevant service. This will recreate the data stores and that should recover the system if the Admin Server is affected due to the mentioned reason.

              We recommend doing this with the help of an IFS support consultant if you do not have the required knowledge. If you have prior knowledge it’s Ok to go ahead. There are no negative implications for trying this solution.  Once more, option 1. is the most reliable solution to recover from such a state.

Userlevel 4
Badge +5

I’ve tried to use this tool in the past to start the VM:
http://selfservice.ifsworld.com/rnd/ResourceMngt.aspx

Reply