Export the configuration files of the Orchestrator that failed:
If the Orchestrators run R80.20SP with the Jumbo Hotfix Accumulator Take 310 and higher:
Note - Starting from R80.20SP Jumbo Hotfix Accumulator Take 310, each Orchestrator keeps the configuration of the peer Orchestrator as a local backup.
Connect to the command line on the Orchestrator that continues to work.
Log in to Gaia Clish.
Export the configuration files of the Orchestrator that failed:
set maestro export remote orchestrator id <ID of the Orchestrator That Failed> configuration archive-name <Name of Output Archive File> path <Full Path on Orchestrator That Continues to Work>
Example:
set maestro export remote orchestrator id 1_2 configuration archive-name Export_from_failed_Orch_1_2 path /var/log/
If the Orchestrators run R80.20SP with the Jumbo Hotfix Accumulator Take 309 and lower:
Connect to the command line on the Orchestrator that failed.
Log in to the Expert mode.
Export the configuration files of the Orchestrator that failed:
tar -czf <Full Path on Orchestrator That Failed>/<Name of Output Archive File>.tgz -C /etc maestro.json sgdb.json smodb.json maestro_full.json
Example:
tar -czf /var/log/Export_from_failed_Orch_1_2.tgz -C /etc maestro.json sgdb.json smodb.json maestro_full.json
On the Orchestrator that failed, copy the archive with the exported configuration files to your computer.
Use an SCP client (WinSCP requires the default user shell to be /bin/bash).
On the Orchestrator that failed, stop the Orchestrator service:
Connect to the command line on the Orchestrator that failed.
Log in to the Expert mode.
Stop the service:
orchd stop
On the Orchestrator that failed, write down the port numbers, to which the Uplink cable, the Downlink cables, and the Sync cables are connected.
On the Orchestrator that failed, disconnect all cables.
Important - Do not halt (shut down) the Orchestrator that failed. It might reboot, and the Orchestrator services will start again.
Remove the Orchestrator that failed from the rack.
Install the replacement Orchestrator in the rack.
Do not connect the power or network cables yet. You do it gradually later.
On the replacement Orchestrator, connect only these cables:
The network cable from your computer to the Orchestrator's MGMT interface.
The console cable from your computer to the Orchestrator's Console Port.
In your console client, configure these settings:
Baud Rate - 9600
Data bits - 8
Stop bits - 1
Parity - None
Flow Control - None
The power cables to the Orchestrator's Power Supply Units.
On the replacement Orchestrator, install the same Take of the R80.20SP Jumbo Hotfix Accumulator as installed on the working Orchestrator (in our example, "1_1").
Wait for the Orchestrator to reboot.
On the replacement Orchestrator, stop the services:
Connect to the command line the replacement Orchestrator through the MGMT interface (over SSH).
Log in to the Expert mode.
Stop the Orchestrator service:
orchd stop
Stop the LLDP service:
tellpm process:lldpd
On the replacement Orchestrator - copy the archive with the exported configuration files from your computer to the replacement Orchestrator to some directory (for example, /var/log/).
Use an SCP client (WinSCP requires the default user shell to be /bin/bash).
On the replacement Orchestrator, import the configuration files you collected earlier:
If the Orchestrator runs R80.20SP with the Jumbo Hotfix Accumulator Take 310 and higher:
Connect to the command line on the replacement Orchestrator.
Log in to Gaia Clish.
Import the configuration files:
set maestro import configuration archive-name <Name of Output Archive File> path <Full Local Path>
Example:
set maestro import configuration archive-name Export_from_failed_Orch_1_2.tgz path /var/log/
If the Orchestrator runs R80.20SP with the Jumbo Hotfix Accumulator Take 309 and lower:
Connect to the command line on the replacement Orchestrator.
Log in to the Expert mode.
Unpack the configuration files to the /etc/ directory:
tar -xzf <Full Local Path>/<Name of Output Archive File>.tgz -C /etc
Example:
tar -xzf /var/log/Export_from_failed_Orch_1_2.tgz -C /etc
On all Orchestrators, make sure the configuration files are the same.
The MD5 checksums must be the same on the two Orchestrators.
If the MD5 checksums are different:
Copy the /etc/sgdb.json file from the Orchestrator that continues to work ("1_1") to the replacement Orchestrator ("1_2") to the /etc/ directory.
Check the MD5 checksums again.
On the replacement Orchestrator, connect these cables to ports with the same numbers as they were connected on the Orchestrator that failed:
The Downlink cables
The Sync cable(s)
Important - Make sure you connected the cables to the correct ports.
On the replacement Orchestrator, start the Orchestrator service:
Connect to the command line on each Orchestrator.
Log in to the Expert mode.
Disable the Link State Propagation (LSP):
jsont -f /etc/smodb.json -s /orch_lsp_state -v off
Start the Orchestrator service:
orchd start
Enable the Link State Propagation (LSP):
jsont -f /etc/smodb.json -s /orch_lsp_state -v on
Restart the Orchestrator service (this also starts the LLDP service):
orchd restart
On all Orchestrators, make sure the traffic can pass traffic between Orchestrators:
In a Single Site deployment:
On the Orchestrator that continues to work (in our example, "1_1"), send pings to the replacement Orchestrator (in our example, "1_2"):
ping 1_2
In a Dual Site deployment:
On the replacement Orchestrator (in our example, "1_2"), send pings to each working Orchestrator:
ping 1_1
ping 2_1
ping 2_2
Make sure the Security Group Members can pass traffic to each other:
Connect to the command line on the Security Group.
Log in to the Expert mode.
Identify the Security Group Member that runs as SMO:
asg stat -i tasks
Examine the cluster state of the Security Group Members.
On the SMO Security Group Member, run:
cphaprob state
The output must show that all Security Group Members are active.
Send pings between Security Group Members:
Connect to one of the Security Group Members (in our example, we connect to the first one - "1_1"):
member 1_1
On this Security Group Member, send pings to any other Security Group Member (in our example, we send pings to the second one - "1_2" / "2_2"):
In a Single Site deployment:
ping 1_2
In a Dual Site deployment:
ping 1_2
ping 2_2
On the replacement Orchestrator, connect the Uplink cables to ports with the same numbers as they were connected on the Orchestrator that failed.
On each Security Group, make sure all links are up on the Security Group:
Connect to the command line on the Security Group.
Examine the state of links:
asg_if
Procedure for Orchestrators that run the R80.20SP version with the Jumbo Hotfix Accumulator Take 309 and lower - when the Orchestrator that failed is not accessible anymore, or after restoring to factory default
Important - Follow the procedure below if the Orchestrator that failed is not accessible, or if its configuration is empty (for example, after restoring to factory default).
On the Orchestrator that continues to work, export the required configuration files:
Connect to the command line.
Log in to the Expert mode.
Copy these files from the Orchestrator that continues to work to your computer:
/etc/maestro_remote.json
/etc/maestro_full.json
/etc/sgdb.json
/etc/smodb.json
Use an SCP client (WinSCP requires the default user shell to be /bin/bash).
On the Orchestrator that failed, write down the port numbers, to which the Uplink cable, the Downlink cables, and the Sync cables are connected.
Remove the Orchestrator that failed from the rack.
Install the replacement Orchestrator in the rack.
Do not connect the power or network cables yet. You do it gradually later.
On the replacement Orchestrator, connect only these cables:
The network cable from your computer to the Orchestrator's MGMT interface.
The console cable from your computer to the Orchestrator's Console Port.
In your console client, configure these settings:
Baud Rate - 9600
Data bits - 8
Stop bits - 1
Parity - None
Flow Control - None
The power cables to the Orchestrator's Power Supply Units.
On the replacement Orchestrator, install the same Take of the R80.20SP Jumbo Hotfix Accumulator as installed on the working Orchestrator (in our example, "1_1").
Wait for the Orchestrator to reboot.
On the replacement Orchestrator, stop the services:
Connect to the command line through the MGMT interface (over SSH).
Log in to the Expert mode.
Stop the Orchestrator service:
orchd stop
Stop the LLDP service:
tellpm process:lldpd
On the replacement Orchestrator - copy these files from your computer to the replacement Orchestrator to the /etc/ directory:
maestro_remote.json
maestro_full.json
sgdb.json
smodb.json
Use an SCP client (WinSCP requires the default user shell to be /bin/bash).
On the replacement Orchestrator, edit the configuration files (in the Expert mode):
Rename the file:
from /etc/maestro_remote.json
to /etc/maestro.json
Edit the file /etc/smodb.json and change the value of the member_id parameter to the ID of the Orchestrator that failed (in our example, "2").
On all Orchestrators, make sure the configuration files are the same.
On the Orchestrator that continues to work, export the configuration files for the Orchestrator that failed:
Connect to the command line on the Orchestrator that continues to work.
Log in to Gaia Clish.
Export the configuration files for the Orchestrator that failed:
set maestro export remote orchestrator id <ID of the Orchestrator That Failed> configuration archive-name <Name of Output Archive File> path <Full Path on Orchestrator That Continues to Work>
Example:
set maestro export remote orchestrator id 1_2 configuration archive-name Export_from_failed_Orch_1_2 path /var/log/
On the Orchestrator that continues to work, copy the archive with the exported configuration files to your computer.
Use an SCP client (WinSCP requires the default user shell to be /bin/bash).
On the Orchestrator that failed, stop the Orchestrator service:
Connect to the command line on the Orchestrator that failed.
Log in to the Expert mode.
Stop the service:
orchd stop
On the Orchestrator that failed, write down the port numbers, to which the Uplink cable, the Downlink cables, and the Sync cables are connected.
On the Orchestrator that failed, disconnect all cables.
Important - Do not halt (shut down) the Orchestrator that failed. It might reboot, and the Orchestrator services will start again.
Remove the Orchestrator that failed from the rack.
Install the replacement Orchestrator in the rack.
Do not connect the power or network cables yet. You do it gradually later.
On the replacement Orchestrator, connect only these cables:
The network cable from your computer to the Orchestrator's MGMT interface.
The console cable from your computer to the Orchestrator's Console Port.
In your console client, configure these settings:
Baud Rate - 9600
Data bits - 8
Stop bits - 1
Parity - None
Flow Control - None
The power cables to the Orchestrator's Power Supply Units.
On the replacement Orchestrator, install the same Take of the R81.10 Jumbo Hotfix Accumulator as installed on the working Orchestrator (in our example, "1_1").
Wait for the Orchestrator to reboot.
Important Note - Choose not to activate the Orchestrator. If you activated the Orchestrator, then in the Expert mode, run the "orchd stop" command.
On the replacement Orchestrator, copy the archive with the exported configuration files from your computer to the replacement Orchestrator to some directory (for example, /var/log/).
Use an SCP client (WinSCP requires the default user shell to be /bin/bash).
On the replacement Orchestrator, import the configuration files you collected earlier:
Connect to the command line on the replacement Orchestrator.
Log in to Gaia Clish.
Import the configuration files:
set maestro import configuration archive-name <Name of Output Archive File> path <Full Local Path>
Example:
set maestro import configuration archive-name Export_from_failed_Orch_1_2.tgz path /var/log/
On all Orchestrators, make sure the configuration files are the same.
The MD5 checksums must be the same on the two Orchestrators.
If the MD5 checksums are different:
Copy the /etc/sgdb.json file from the working Orchestrator (1_1) to the replacement Orchestrator (1_2) to the /etc/ directory.
Check the MD5 checksums again.
On the replacement Orchestrator, connect these cables to ports with the same numbers as they were connected on the Orchestrator that failed:
The Downlink cables
The Sync cable(s)
Important - Make sure you connected the cables to the correct ports.
On the replacement Orchestrator, start the Orchestrator service:
Log in to the Expert mode.
Disable the Link State Propagation (LSP):
jsont -f /etc/smodb.json -s /orch_lsp_state -v off
Check the Orchestrator services status:
orchd status
If the Orchestrator is not active, then activate it (it will start the Orchestrator services):
orchd activate
(Otherwise, start the Orchestrator services: orchd start)
Enable the Link State Propagation (LSP):
jsont -f /etc/smodb.json -s /orch_lsp_state -v on
Restart the Orchestrator port monitoring daemon with these commands:
tellpm process:ssm_pmd
tellpm process:ssm_pmd t
Note - It may take a few seconds for the Orchestrator to discover the connected Security Appliances for the first time. In the Expert mode, run the "watch -d -n 1 orch_stat" command and wait for all the LAGs to be in the "up" status.
On all Orchestrators, make sure the traffic can pass between Orchestrators:
In a Single Site deployment:
On the Orchestrator that continues to work (in our example, "1_1"), send pings to the replacement Orchestrator (in our example, "1_2"):
ping 1_2
In a Dual Site deployment:
On the replacement Orchestrator (in our example, "1_2"), send pings to each working Orchestrator:
ping 1_1
ping 2_1
ping 2_2
Make sure the Security Group Members can pass traffic to each other:
Connect to the command line on the Security Group.
Log in to the Expert mode.
Identify the Security Group Member that runs as SMO:
asg stat -i tasks
Examine the cluster state of the Security Group Members.
On the SMO Security Group Member, run:
cphaprob state
The output must show that all Security Group Members are active.
Send pings between Security Group Members:
Connect to one of the Security Group Members (in our example, we connect to the first one - "1_1"):
member 1_1
On this Security Group Member, send pings to any other Security Group Member (in our example, we send pings to the second one - "1_2" / "2_2"):
In a Single Site deployment:
ping 1_2
In a Dual Site deployment:
ping 1_2
ping 2_2
On the replacement Orchestrator, connect the Uplink cables to ports with the same numbers as they were connected on the Orchestrator that failed.
On each Security Group, make sure all links are up on the Security Group:
Connect to the command line on the Security Group.