Manually kickstarting a deployment

There are occasions that a Deployment can get stuck in a "PENDING" state.  There are a few possible reasons that this can occur.  Below is a list of reasons and steps that can kick start a deployment.

Reason #1 - Serial Lock

Background: System Events have the capability to be serially executed.  Deployment events are intentionally designed to use this serial execution capability.  That is to guarantee the consistency of the data being deployed.  The same order that you approve/deploy is the order that the data will be applied to your site/catalog.  To manage that serial process, a global lock is used.  The lock is managed in the blc_serial_event_lock table.  

Potential Issue: If a serial execution event is triggered on a node and that node is restarted, the lock can be stranded.  If the "SCHEDULED_DEPLOY" row in the blc_serial_event_lock table is marked as locked (e.g. lock_val = 1) and it is not getting cleared then the lock is stranded.  

select * from blc_serial_event_lock where key_val = "SCHEDULED_DEPLOY";

The Fix: If the serial lock is stranded, we just need to unlock it.  To unlock, the lock_val can be set to 0.

update blc_serial_event_lock set lock_val=0 where key_val = "SCHEDULED_DEPLOY";

Reason #2 - Deployment Master

Background: The system elects two "master" nodes that manage standard Scheduled Job scheduling (Job Master) and Deployment Job scheduling (Deployment Master).  These nodes are responsible for making sure that the Scheduled Jobs are picked up and processed.  The job master is tracked in the blc_schedl_job_master table.  Two rows should exist: Job Master (ID=1) and Deployment Master (ID=2).  Note that the IDs are always 1 and 2.  An associated table that is involved in the election process is the blc_registered_node table (or blc_node_registration for versions < 5.2).  The blc_registered_node table will list all the active nodes in the cluster.

Potential Issue: The nodes go through a "master election" process to designate one node for each of the master roles.  There are scenarios where old nodes remain elected in a master role when they should not.  This is rare but when this happens it is usually associated with an issue launching a new website release.

The Fix: Since there is a "master election" process, the system will self heal if a master node goes offline.  We can take advantage of that process by deleting the Deployment Master node.  The first step is to verify that the nodes in the blc_registered_node (blc_node_registration) table are accurate.  Second, the Deployment Master from blc_sched_job_master should match up to an active admin node.  What can happen is that there are still "old" nodes in the blc_registered_node table and the Deployment Master row in blc_sched_job_master is that old admin.

select * from blc_registered_node; -- for BLC 5.2+
select * from blc_node_registration; -- for BLC < 5.2
select * from blc_sched_job_master;
delete from blc_sched_job_master where sched_job_master_id=2; -- 2 is the deployment master 

Reason #3 - Process 

Background: The system uses a process system to track the progression through the deployment steps.  The process row then provides the ability to re-start deployments should a system restart occur or other unexpected events.  

Potential Issue: On occasion a process restart might not get triggered after an unexpected event.  In that case we can force the process to get re-started.

The Fix: The framework has a table called BLC_PROCESS_STATUS this table is used to keep track of the workflow of a change set along with the BLC_SNDBX_WRKFLW_EVENT.SETUP_STATUS column. The ProcessStatus is the process that will follow up and check the SETUP_STATUS(increments from 100, 200, etc.) to make sure a change set is followed through to completion. However, there are times where this process will end due to an unexpected error on the server. This will leave the entity "stuck" but we can have the framework retry by inserting a new record in this table.

To manually kickstart a deployment you can do the following.

select
  dep.WORKFLOW_DEPLOYMENT_ID,
  dep.STATUS,
  dep.DEPLOY_DATE,
  ev.ACTION_TYPE,
  ev.SETUP_STATUS,
  wrkitem.WRKFLW_SNDBX_ITEM_ID,
  wrkitem.GRP_DESCRIPTION,
  wrkitem.SCHEDULED_DATE,
  wrkitem.ARCHIVED_FLAG,
  wrkitem.CONTAINER_ID
from BLC_SNDBX_WRKFLW_DEPLOY dep
  inner join BLC_SNDBX_WRKFLW_EVENT ev on ev.WORKFLOW_DEPLOY_ID = dep.WORKFLOW_DEPLOYMENT_ID
  inner join BLC_SNDBX_WRKFLW_EVENT_ITEM evitem on evitem.EVENT_ID = ev.WORKFLOW_EVENT_ID
  inner join BLC_SNDBX_WRKFLW_ITEM wrkitem on wrkitem.WRKFLW_SNDBX_ITEM_ID = evitem.ITEM_ID
where
  (dep.STATUS = 'PENDING' or dep.STATUS = 'IN_PROCESS')

Once you have that, you can insert them into the process status table which will allow the system to continue on with them. For example, in this case, if we had 2 deployments in PENDING or IN_PROCESS, we would take the IDs (let's say we got got the 15880 and 15931 for IDs) and insert process status records. We want to check first to make sure they don't already exist (in this case -100 and -101) and if they don't, we insert them.

select * from BLC_PROCESS_STATUS where PROCESS_STATUS_ID in (-100, -101);

Once we verify it we can insert the records the following records:

INSERT INTO blc_process_status (PROCESS_STATUS_ID, DESCRIPTION, EXECUTION_ID, EXECUTION_TYPE, FRIENDLY_NAME, IS_ARCHIVED, IS_PAUSED, LAST_UPDATED, NAME, PERCENT_COMPLETE, PROCESS_ID, VERSION) VALUES (-100, NULL, 15880, 'APPROVAL_PROCESS_TYPE', NULL, NULL, NULL, '2018-08-06 08:40:21:000', 'SCHEDULED_DEPLOY', NULL, -100, 1);
INSERT INTO blc_process_status (PROCESS_STATUS_ID, DESCRIPTION, EXECUTION_ID, EXECUTION_TYPE, FRIENDLY_NAME, IS_ARCHIVED, IS_PAUSED, LAST_UPDATED, NAME, PERCENT_COMPLETE, PROCESS_ID, VERSION) VALUES (-101, NULL, 15931, 'APPROVAL_PROCESS_TYPE', NULL, NULL, NULL, '2018-08-06 08:40:21:000', 'SCHEDULED_DEPLOY', NULL, -101, 1);