Friday 27 March 2015

HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 65 seconds.

Hello Everyone,

Today discussing new issue. I was getting below error on WAS v6.1 environment. When I search for the error I got a very good note which I am sharing with you and adding some clarification.

Error :


HMGR0152W: CPU Starvation detected. Current thread scheduling delay is 65 seconds.
[3/24/15 12:01:10:162 EDT] 00000053 DiscoveryRcv  W   DCSV1115W: DCS Stack DefaultCoreGroup at MemberTestCell01\TestNode01\app_TestNode01_1: MemberTestCell01\TestNode01\nodeagent connection  was closed. Member will  be removed from view. DCS connection status is Discovery|Ptp, receiver closed.

Cause :


The HMGR0152W message indicate that JVM thread scheduling delays are occurring for this process.

The WebSphere Application Server high availability manager component contains thread scheduling delay detection logic, that periodically schedules a thread to run and tracks whether the thread was dispatched and run as scheduled. By default, a delay detection thread is scheduled to run every 30 seconds, and will log a HMGR0152W message if it is not run within 5 seconds of the expected schedule. The message will indicate the delay time or time differential between when the thread was expected to get the CPU, and when the thread actually got CPU cycles.

The HMGR0152W message can occur even when plenty of CPU resource is available. There are a number of reasons why the scheduled thread might not have been able to get the CPU in a timely fashion. Some common causes include the following

  • The physical memory is overcommitted and paging is occurring.
  • The heap size for the process is too small causing garbage collection to run too frequently and/or too long, blocking execution of other threads.
  • There might simply be too many threads running in the system, and too much load placed on the machine, which might be indicated by high CPU utilization.


Note : If you have an admin agent in your environment. Then the JVM get restarted automatically.

Solution :


The HMGR0152W message is attempting to warn you that a condition is occurring that might lead to instability if it is not corrected. Analysis should be performed to understand why the thread scheduling delays are occurring, and what action(s) should be taken. Some common solutions include the following:

  • Adding more physical memory to prevent paging.
  • Tuning the JVM memory (heap size) for optimal garbage collection.
  • Reducing the overall system load to an acceptable value.

If the HMGR0152W messages do not occur very often, and indicate that the thread scheduling delay is relatively short (for example, < 20 seconds), it is likely that no other errors will occur and the message can safely be ignored.

The high availability manager thread scheduling delay detection is configurable by setting either of the following 2 custom properties.

  • IBM_CS_THREAD_SCHED_DETECT_PERIOD determines how often a delay detection thread is scheduled to run. The default value of this parameter is 30 (seconds).
  • IBM_CS_THREAD_SCHED_DETECT_ERROR determines how long of a delay should be tolerated before a warning message is logged. By default this value is 5 (seconds).


These properties are scoped to a core group and can be configured as follows:

  1. In the administrative console, click Servers > Core groups > Core groups settings and then select the core group name
  2. Under Additional Properties, click Custom properties > New.
  3. Enter the property name and desired value.
  4. Save the changes.
  5. Restart the server for these changes to take effect.


While it is possible to use the custom properties mentioned above to increase the thread-scheduling-detect-period until the HMGR0152W warning messages no longer occur, this is not recommended. The proper solution is to tune the system to eliminate the thread scheduling delays.

Hope this will work in your case also. Kindly comment for your suggestion and quires.  

"Effort only fully releases its reward after a person refuses to quit.”

 Regards,
 Akhilesh B. Humbe


Tuesday 24 March 2015

DCSV1115W: DCS connection status is View|Ptp, receiver closed.

Hello Everyone,

Hope you are enjoying working with middleware. Getting error at the startup of the JVMs is a very common thing and I think we learn a lot from this. I was getting this error while starting one of my JVM/Server on WAS v8.5.0.0 environment.

Error:


W   DCSV1115W: DCS Stack DefaultCoreGroup at Member TestCell01\TestNode01\app_TestNode01_1: Member TestCell01\TestCellManager01\dmgr connection was closed. Member will  be removed from view. DCS connection status is View|Ptp, receiver closed.

Cause:


This is the general error that might be encountered during server start phase. While starting the server it's get hung and not roll any type of logs.Basic idea behind this is that, when you start the server, threads are getting initialized for your process/job that you want to run on server. That thread is waiting for few resources which helps them to run the process/job. But at that point of time thread may get hung, because of un-availability of resources.

Solution:


Way to fix it is  Kill the process of server from background. Again start the server.

1. In UNIX o/s you can find out the process and kill it by.
$ ps -ef | grep JVM_Name or 
$ ps -ef | grep java
$kill -9 process_id

2. In WINDOWS environment  you can find out the process and kill it by.
C:\>tasklist |findstr java
C:\>taskkill /F -PID process_id

3. Then start the JVM/Server using startServer.sh or startServer.cmd



Hope this will work in your case also. Kindly comment for your suggestion and quires.    

"Effort only fully releases its reward after a person refuses to quit.”

 Regards,
 Akhilesh B. Humbe

Thursday 19 March 2015

WASX7111E: Cannot find a match for supplied option

Hello Everyone.

Hope you are enjoying working with middleware. Today it was an interesting  error while deploying an application. It was .ear file and I was trying to deploy it on WAS 8.5.0.0. I have a script to deploy application which end with the error given below. Then I try to deploy it using console and got a same error.

Error: 


WASX7017E: Exception received while running file "/opt/scripts/deploy/was/install.py"; exception information:
com.ibm.ws.scripting.ScriptingException: WASX7111E: Cannot find a match for supplied option: "[Testapplication.war,
Testapplication.war,WEB-INF/web.xml,
WebSphere:cell=TestNode01Cell,node=TestNode01,server=Server1]"
 for task "MapModulesToServers". The supplied option must match with the existing task data in the application and the existing task data are:
"["Archetype Created Web Application" Testapplication.war,WEB-INF/web.xml]

It's an application error. But as middleware consultant we always need to find out the reason for an error to explain it from our side. When I search for the error I found some interesting facts.  Here  I am sharing cause and the solution for the error, which you can suggest to an application team. And if you are application team you can implement it.

Cause:


While installing application on WebSphere application server, it try to map application modules to target JVM. While mapping modules it first look into the web.xml for the module name. which we defines into the tab  <display-name></display-name>. If this tab is not present in web.xml then it will display the whole .war file name as module name. The error normally prompt when you have an incorrect entry in <display-name></display-name> tab or have any extra space or any case sensitivity issue.

In my case the application name was Testapplication.war and in display tab it was showing like <display-name>Archetype Created Web Application</display-name>
In such a case you can contact to your application development team to take an action or have a temporary solution given below

Solution:


You can modify the web.xml file and could just comment the entire <display-name> element out or remove it. It will successfully deploy your application.

Concern a application development team before doing this.

Hope this will work in your case also. Kindly comment for your suggestion and quires.  

"Effort only fully releases its reward after a person refuses to quit.”

 Regards,
 Akhilesh B. Humbe

Tuesday 17 March 2015

Error cleaning old files: java.lang.NullPointerException

Hello Everyone.

Today going to discuss about the below error.

Error:


-bash-3.2$ /opt/WebSphere85/profiles/Agent01/bin/stopServer.sh adminagent
Password:
ADMU0116I: Tool information is being logged in file
           /opt/WebSphere85/profiles/Agent01/logs/adminagent/stopServer.log
ADMU0128I: Starting tool with the Agent01 profile
ADMU3100I: Reading configuration for server: adminagent
Error cleaning old files: java.lang.NullPointerException
com.ibm.rmi.ras.Utility.newWriter: could not write to orbmsg.12032015.1310.34.txt : java.io.FileNotFoundException: /home/XXX/orbmsg.12032015.1310.34.txt (Permission denied) 
com.ibm.rmi.ras.Utility.newWriter: could not write to orbtrc.12032015.1310.34.txt : java.io.FileNotFoundException: /home/XXX/orbtrc.12032015.1310.34.txt (Permission denied)


The error which is highlighted above is very common error we normally face while start up or stop of WebSphere application server, I am facing this error while stopping the adminagent.

As it's showing permission denied it means. I don't have permission on this file which needs to clean while stop. There may the following reasons of this thing

Cause:


  1. We are trying to stop the adminagent using other user than WAS, which does not have a permission to read/write this file.
  2. Some times using was user also getting same error. Means in this case you have started/stopped the adminagent [JVM] using root user previously, which changed the permissions of some files to root user and corresponding group. And now not allow WAS user to clean or replace these files.

Solution:


There is very simple solution for that.
1. Goto the  /opt/WebSphere85/profiles/ location and change the owner of the profile to was:was
2. In my case it is.
   /opt/WebSphere85/profiles/> chown -R  was:was Agent01
   Which will change the owner[user/group] of all file in Agent01 profile as was.


Hope this will work in your case also. Kindly comment for your suggestion and quires.  

"Effort only fully releases its reward after a person refuses to quit.”

 Regards,
 Akhilesh B. Humbe

Wednesday 11 March 2015

1 Managed connections are already being used on this thread.

Hello everyone,

Today it was the below error in my WAS 7.0.017 ND environment..

[3/10/15 16:34:13:185 EDT] 000000cb PoolManager   W   Exceeded the number of allowable managed connection on thread 000000cb.  1 managed connections are already being used on this thread.    Managed connection being used on this thread 
MCWrapper id 1462303  Managed connection org.apache.activemq.ra.ActiveMQManagedConnection@1a4bf8a State:STATE_TRAN_WRAPPER_INUSE Thread Id: 000000cb Thread Name: WebContainer : 0 Handle count 0
     Start time inuse Tue Mar 10 16:34:13 EDT 2015 Time inuse 0 (seconds)
     Last allocation time Tue Mar 10 16:34:13 EDT 2015
     getConnection stack trace information:   org.apache.activemq.ra.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:67)
          org.springframework.jms.support.JmsAccessor.createConnection(JmsAccessor.java:184)
          org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:456)
          org.springframework.jms.core.JmsTemplate.send(JmsTemplate.java:534)
          org.springframework.jms.core.JmsTemplate.convertAndSend(JmsTemplate.java:612)

This as unexpected error occurred in my  SystemOut.log and at the same time my application performance was on it's worst. When I look for the error on google I found the one solution frequently for this  error and logically I think this is correct. I didn't implement this solution because in my case it resolved by the different thing. But here I am going to share the both things with you. The first one which I applied and second may also helps you.

1. Change Log Level:

The solution in my case was that I had to change/reduce the trace level of the AppServer back to *info from *finest , as the server was too busy writing logs besides serving a database connection request.

2. Increase maxNumberOfMCsAllowableInThread parameter for DataSource:

As in error it is showing that 1 managed connections are already being used on this thread means there is only one managed connection is allowed on each thread and it's in used now. But there may be scenario when the application is designed such that web layer (controller) makes database calls (read only) in addition to database calls (Read/Write) at the service layer in such a scenario we would assume we would have more than one connection involved per thread. So then the question is, is it possible to configure the WAS to increase the allowed size of managed connection to more than 1. And the answer is yes we can increase it using the below steps

Procedure:

Go to Data sources > DataSource_Name >Connection pools > Custom properties > New Create the new custom pool property  maxNumberOfMCsAllowableInThread setting it to the value you wish, In our case we set it to the 10.



Click on Apply and Save.

This worked in my case, hope will work for you also. Kindly comment for your suggestion and quires.  

"Effort only fully releases its reward after a person refuses to quit.”

 Regards,
 Akhilesh B. Humbe

Popular Posts