Known Problems With Cluster Feature

This section describes known problems with the cluster feature of Adaptive Server.

Table 1. Solaris-specific issues

CR #

Description

494933

Extraneous input from shell can cause a hang

If the standard input of the shell that starts the Cluster Edition causes any exceptions such as SIGHUP, the server stops responding, and the error log is filled with:
ncheck_quit: no socket table entry for fd 0
ncheck: select, Invalid argument

Workaround : Run the Cluster Edition in the background.

Table 2. Nonplatform-specific cluster-related issues

CR #

Description

575289

CIPC message leak may cause cluster hang

Adaptive Server may leak a CIPC message during client connection failover. Repeated client failovers can eventually exhaust the CIPC regular message pool, leading to a cluster hang. The number of messages is controlled by the configuration parameter "CIPC regular message pool size". The default number of regular messages is 8192, which would be exhausted if 400 clients each failed over 20 times.

Workaround: Increase the number of regular messages if large number of client connections is configured.

575221

Running dbcc page may encounter a signal 11

In a cluster environment, the dbcc page command may hit a signal 11 (segmentation fault) error.

Adaptive Server will continue to run fine, even though the connection running the dbcc command is terminated.

Workaround: Rerun the command in other nodes.

575043

Cannot manage clusters from Adaptive Server plug-in or sybcluster after deploying UAF plug-in

When you deploy a plug-in after manually creating a cluster or upgrading from a shared installation mode cluster to private installation mode, you may see the following errors when you run commands such as show cluser config or show cluster status:

ERROR - Exception invoking method getClusterConfiguration
ERROR - java.lang.NullPointerException

This occurs because an incorrect interfaces file location is recorded in the agent configuration.

Workaround: Correct the interfaces file path in the agent configuration to fix the issue.


  1. Determine the correct the interfaces file path information by querying the quorom device using the qrmutil command:

    /mysybase1/ASE-15_0/bin/qrmutil --quorum_dev=/dev/raw/raw50m41 --display=all

    This example includes information about the interfaces path for different instances:

    Cluster configuration id: ebff342a-2eb2-cdc2-9fd9-bdc6de7f3e91
            Cluster name: 'mycluster'
            Max instances: 4
            Master devices: '/dev/raw/raw1g2'
            Config file: ''
    ...............
    Displaying instance 'mycluster_instance1' (1)
            Instance id: 1
            Instance name: 'mycluster_instance1'
            Host node: 'nuno1'
            Primary address: 'nuno1'
            Primary port start: '15100'
            Secondary address: 'nuno1'
            Secondary port start: '15181'
            Errorlog: '/mysybase1/mycluster_instance1.log'
            Config file: '/mysybase1/mycluster.cfg'
            Interfaces path: '/mysybase1'
            Traceflags: ''
            Additional run parameters: ''
    ..........
  2. On each node for the cluster, go to $SYBASE/UAF-2_5/nodes/<hostName>/plugins/<clusterName>/agent-plugin.xml and edit the interfaces path to include the interfaces path for that particular instance, which is specific to that instance. For example:
    <set-property property="ase.home" value="/mysybase1/ASE-15_0" />
    <set-property property="ase.installation.mode" value="private" />
    <set-property property="ase.interfaces.path" value="/mysybase1" />
    <set-property property="ase.quorum.device" value="/dev/raw/raw50m41" /> 
  3. Shut down and restart the UAF agent on each node.

574889

A cluster view change requires a wait before accessing sysprocesses and monProcess table

Attempts to populate tables like sysprocesses and monProcess may not work while a cluster view change is in progress, such as an instance join, leave, or failure.

Workaround: Wait a few minutes before running any commands that access the monProcess table, or that try to materialize a dynamically built sysprocesses system fake table. This includes commands that use following stored procedures:
  • sp_auth
  • sp_dropengine
  • sp_droplogin
  • sp_clearpsexe
  • sp_client_addr
  • sp_clusterconnection
  • sp_dbxt_reload_defaults
  • sp_dbxt_sprocs
  • sp_dropuser
  • sp_familylock
  • sp_ha_verification
  • sp_helpapptrace
  • sp_locklogin
  • sp_monitor_connection
  • sp_monitor_procstack
  • sp_monitor_statement
  • sp_multdb_show
  • sp_setpsexe
  • sp_showpsexe
  • sp_who

574863

The Cluster Edition of Adaptive Server uses a cluster input file to contain all the information regarding the interconnect addresses, server instances, and number of server instances in cluster. Adaptive Server supports two private interconnects for communication among different server instances in cluster: 1. PRIMARY 2. SECONDARY

In the cluster input file, we can either define both the links i.e. primary/secondary, or just one link as primary. In case we define both the links (i.e. primary and secondary in cluster configuration file), one has to make sure that before user boots the cluster, both the links (primary and secondary) are and active.

This means that we do not support the scenario where the private interconnects, primary and secondary, are configured in the cluster input file and either of them is down before booting the cluster or, while we are booting the cluster.

574616

Attempts to localize from sybcluster or plug-in in private installation mode can hang

If you use sybcluster or the Adaptive Server plugin for localization on a cluster using a private installation, some localization actions—such as installing or changing default charsets and sort orders—could cause the system to hang.

Workaround: The commands should complete if you are only adding or removing languages. To add or change default character sets or sort orders on a private installation mode cluster, however, do not use sybcluster or the Adaptive Server plugin. Instead, use the charset utility directly. See Chapter 18, "Customizing Localization for the Cluster Edition," of the Clusters Users Guide for details on how to use charset utility.

573197

Starting an instance or cluster may display the server boot command without really invoking the command on the target node.

On a hardware host where the operating system cannot complete a restart of an Adaptive Server process within 10 seconds, you might see a message from sybcluster similar to:

p6sd0_12738> start instance p6sd0_12738_ns4  
INFO  - Starting the cluster p6sd0_12738 instance p6sd0_12738_ns4 using the operating system command:  
/remote/server/user/ase1503ce/aix_c3/ASE-15_0/bin/dataserver --quorum_dev=  
/dev/rhdisk15 --instance_name=p6sd0_12738_ns4 -N /p6sdcperf_shared4/s16018278/no  
de_s4/p6sd0_12738_ns4.prop  

INFO  - Instance p6sd0_12738_ns4 status is Down. The process will check every 10 seconds for 6 more times while the instance completes its startup... 
INFO  - Instance p6sd0_12738_ns4 status is Down. The process will check every 10 seconds for 5 more times while the instance completes its startup... 
INFO  - Instance p6sd0_12738_ns4 status is Down. The process will check every 10 seconds for 4 more times while the instance completes its startup... 
INFO  - Instance p6sd0_12738_ns4 status is Down. The process will check every 10 seconds for 3 more times while the instance completes its startup... 
INFO  - Instance p6sd0_12738_ns4 status is Down. The process will check every 10 seconds for 2 more times while the instance completes its startup... 
INFO  - Instance p6sd0_12738_ns4 status is Down. The process will check every 10 seconds for 1 more times while the instance completes its startup... 

Any further attempts to retry this command may cause the Adaptive Server to not restart with the same information.

Workaround: Restart Adaptive Server manually by using the command printed in the message from sybcluster in the operating system command. For example, the command for cluster p6sd0_12738 instance p6sd0_12738_ns4 is:

/remote/server/user/ase1503ce/aix_c3/ASE-15_0/bin/dataserver --quorum_dev= /dev/rhdisk15 --instance_name=p6sd0_12738_ns4 -N /p6sdcperf_shared4/s16018278/no de_s4/p6sd0_12738_ns4.prop

568101

Misleading error message created during cluster creation

While you create a new cluster, you may see an error message such as:

Create the cluster now?  [ Y ] 
INFO  - Creating the Cluster Agent plugin on host address vcsone154.cdc.veritas.com using agent: vcsone154.cdc.veritas.com:9999
INFO  - Creating the Cluster Agent plugin on host address vcsone155.cdc.veritas.com using agent: vcsone155.cdc.veritas.com:9999
2009-04-10 14:09:56,177 INFO  [RMI TCP Connection(26)-10.198.90.155] Plugin registered. Updating lookup info...
2009-04-10 14:09:56,707 ERROR [RMI TCP Connection(26)-10.198.90.155] The cluster entry mycluster24 did not contain any servers    ©¬------------------here is the error message.
2009-04-10 14:09:56,710 WARN  [RMI TCP Connection(26)-10.198.90.155] Unable to obtain the quorum configuration. The cluster may not be configured.

Workaround: These intermittent messages are harmless, and do not affect cluster creation. You can ignore these.

486377

Setting 'cluster heartbeat retries' to value higher than 2 not permitted

Configuring cluster heartbeat retries to a value greater than 2 could result in a total cluster shutdown during failover handling for clusters with more than 2 instances.

Workaround: Do not configure cluster heartbeat retries to a value higher than 2.

485070

Restarting an unfinished Cluster Creation wizard session

If you exit from the Cluster Creation wizard during a cluster configuration and then restart the wizard session using the same configuration parameters (cluster name, instance name, and so on), the wizard could have already created some configuration files and devices.

Workaround: Before you restart a wizard session:


  1. Stop the srvbuildres or dataserver utilities, if either are running.
  2. Stop the UAF agents on all nodes.
  3. Remove the directory with the name of the cluster you tried to create during the wizard session in the $SYBASE_UA/nodes/node_name/plugins/cluster_name directory.
  4. Remove the interfaces file entries for the cluster you tried to create.
  5. Restart the UAF agents on all nodes.

483651

Incorrect Cluster failover can occur

If an instance is starting while the rest of the cluster is performing a diagnostic shared memory dump, the instance that is starting may incorrectly perform a cluster takeover. This only occurs if automatic cluster takeover is set to 1 or if the --cluster_takeover option is passed to the dataserver. In environments without i/o fencing enabled, this may lead to data corruption.

Workaround: Avoid starting an instance while a shared memory dump is taking place. Configure automatic cluster takeover to 0.


Created June 19, 2009. Send feedback on this help topic to Sybase Technical Publications: pubs@sybase.com