預計完成時間:2 小時
可操作元件擁有者:OELCM技能設定檔:部署工程師
18.1. 檢查設定
如要確認 HPE 交付的 Google Distributed Cloud (GDC) 實體隔離硬體和軟體資產品質、安全性及效力,確保這些資產已準備好用於實際工作環境,請使用 Distributed Cloud 發布版本的驗證 CLI。
驗證套件會測試裝置的健康狀態、安裝和設定,並驗證伺服器、網路交換器、檔案/ 區塊儲存空間、物件儲存空間、防火牆和 HSM 等。
如要驗證硬體,請完成下列步驟:
在啟動程序機器上,以根存取權
sudo執行驗證 CLI 指令:sudo RELEASE_DIR/gdcloud system check-config --config CELL_CONFIG_PATH --artifacts-directory ARTIFACTS_DIR --scenario ConfigCheck這個指令會記錄 ARTIFACTS_DIR 中的所有記錄。
如果發現任何錯誤,請根據錯誤訊息修正所有問題。重新執行驗證。
如果所有報告都顯示正常,請繼續下一個步驟。
18.2. 潛在問題
本節列出在 Distributed Cloud 執行個體安裝後驗證時,可能遇到的問題。
18.2.1. 所有 Google Distributed Cloud 版本中的潛在問題
18.2.1.1. 網路檢查會錯誤地將連接至配線架的儲存裝置標示為有問題
問題:
檢查失敗,摘要文字為:Storage network connection mismatched
詳細資料文字如下所示:
Got: xx-ab-stge01-01:e0g<>xx-ab-torsw02 (:::::):Ethernet1/1/1,
want: expected: xx-ab-stge01-01:e0g<>xx-ab-ppl01:r04Ap01BO-ft
主要症狀是檢查的第二部分,包含某種配線盤標籤,例如 r04Ap01BO-ft。
解決辦法:
在 assets/inv/inv-core.yaml 檔案中,手動檢查 CR 儲存格:
使用失敗示例:Got: xx-ab-stge01-01:e0g<>xx-ab-torsw02 (:::::):Ethernet1/1/1,
want: expected: xx-ab-stge01-01:e0g<>xx-ab-ppl01:r04Ap01BO-ft
- 確認有名為儲存裝置和配線架的項目。
舉例來說,xx-ab-stge01-01:e0g<>xx-ab-ppl01:r04Ap01BO-ft 會變成:
- cableType: MMF
color: Aqua
endA: xx-ab-stge01-01:e0g
endATransceiverMPN: X65404-N-C
endB: xx-ab-ppl01:r04Ap01BO-ft
length: 2m
mpn: 'OM4LCDX #40220 (2m)'
- 確認對應的配線架連結至具名 torswitch。
如要找出配線架的另一側,請使用 r04Ap01BO-ft,並將第一部分 (含 r 和數字) 的 -ft 變更為 -bk。r04Ap01BO-ft 和 r04Ap02BO-ft 對應至 r04Ap01BO-bk
- cableType: MMF
color: Magenta
endA: xx-ab-torsw02:Eth1/1
endATransceiverMPN: QSFP-100G-SL4
endB: xx-ab-ppl01:r04Ap01BO-bk
length: 1.5m
mpn: '12FMTPOM4 #73704 (1.5m)'
notes: 25Gb breakout
電纜入口的另一端應與檢查的第一部分相符,在本例中為:
乙太網路 1/1/1 表示實體連接埠 1 上的 torsw02 是透過分接盒連接至第一個分接頭。
如果對應關係正確無誤,可以忽略這項檢查。
18.2.1.2. 物件儲存空間網站發生對帳錯誤 (DNS 後置字串錯誤)
問題:
ObjectStorageSite 自訂資源設為 Ready: false,且其記錄會回報 Reconcile error, retrying: failed to parse location, found malformed DNSSuffix。
解決辦法:
請忽略這些錯誤。安裝程序完成「根管理員叢集啟動」步驟後,這些檔案就會消失。
18.2.1.3. 根管理員叢集的不含作業系統機器設定有誤
驗證輸出內容中的失敗範例:
- passed: false
description: |-
BMM setting validation on server xx-yy-bm01 failed with error:
server has unexpected settings:
/redfish/v1/Systems/1/SecureBoot.SecureBootEnable is true, want false
target: xx-yy-bm01
targettype: ServerSettings
vendorerrorcode: SERVER_TEST_FAIL(0x04)
gpcerrorcode: FailedInBMMSetting
mitigation: Refer to the artifact to see which server flags. Check the connection
to the server iLO port. Check the account of iLO. Check if the iLO and server
are fully powered up. Check the concerned settings of server ah-ab-bm01.
18.2.1.4. 插線面板不符
問題:
硬體檢查應以連線尾端的裝置為目標,而非直接連線的裝置 (xx-xx-ppl)。
範例:
- description: This check validates the storage network connection against the cell
configuration.
target: xx-yy-stge01-01:e0e<>xx-yy-torsw01 (aa:aa:aa:aa:aa:aa):Ethernet1/1/1
targettype: ""
checkresult:
passed: false
summary: Storage network connection mismatched.
detail: 'Got: xx-yy-stge01-01:e0e<>xx-yy-torsw01 (aa:aa:aa:aa:aa:aa):Ethernet1/1/1,
want: expected: xx-yy-stge01-01:e0e<>xx-yy-ppl01:r03Ap01BO-ft'
vendorerrorcode: ""
errorcode: VAL-E3026
mitigation: If this check fails, it can indicate that the Storage system is not
configurated according to the configuration file. Adjust the cabling so it matches
with the cell configuration.
解決辦法:
忽略錯誤。
18.2.1.5. Ping 測試失敗
問題:
這是 CDP 的生理行為,因為必須發生 ARP 洪流,才能在交換器上填入 CAM 表格,並連線至裝置。前 1 到 5 個封包預計有很高的機率會遺失。
範例:
- description: This check validates the link quality from the management switches
to other switches and baremetal node by measuring the packet delivery ratio of
100 ping requests.
target: xx-yy-mgmtsw01
targettype: ManagementSwitch
checkresult:
passed: false
summary: Link quality from ManagementSwitch to other devices is degraded.
detail: |-
Check the cable connections of management switch xx-yy-mgmtsw01.
Error:
ping test failed on link xx-yy-mgmtsw01:Eth1/52<>xx-yz-mgmtaggsw01:Eth1/1 with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/32<>xx-yy-aggsw01:mgmt0 with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/36<>xx-yy-mgmtaggsw01:mgmt0 with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/41<>xx-yy-torsw02:mgmt0 with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/42<>xx-yy-torsw01:mgmt0 with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/51<>xx-yy-mgmtaggsw01:Eth1/1 with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/45<>xx-yy-base02:ilo with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/46<>xx-yy-base03:ilo with 1 packets dropped in 100 packets send
ping test failed on link xx-yy-mgmtsw01:Eth1/24<>xx-yy-base03:LOM1 with 1 packets dropped in 100 packets send.
vendorerrorcode: SWITCH_TEST_FAIL(0x01)
errorcode: VAL-E1003
mitigation: If this check failed, it usually means the network cables from the
management switch need to be inspected or replaced. Check the artifacts directory
or stdout to see which cable flagged.
解決辦法:
忽略錯誤。
18.2.1.6. 檢查 ONTAP 儲存空間叢集名稱
問題:
自動化程序會尋找 ONTAP 裝置主機名稱,但 ONTAP 裝置在交換器上顯示的卻是序號。
範例:
- description: This check validates the storage cluster name and management interface
are consistent between netapp ontap client and the cell configuration.
target: yy-stge-clus-01
targettype: StorageCluster
checkresult:
passed: false
summary: StorageCluster management interface cannot be found.
detail: StorageCluster management interface x.x.x.x in the cell configuration
cannot be found in the netapp ontap client.
vendorerrorcode: STORAGE_TEST_FAIL(0x03)
errorcode: VAL-E3007
mitigation: Review if management IPfor StorageCluster yy-stge-clus-01 in the cell
configuration is correct.
解決辦法:
忽略錯誤。
18.2.1.7. Bootstrapper LLDP Discovery Fail
問題:
show lldp neighbors 無法從 TOR 交換器找到啟動程式。這似乎是因為啟動程式 (Ubuntu) 上的 OS 不會回應 LLDP 要求。
範例:
- description: This check validates the connection between TorSwitch and Server. The
connection is retriveved via "show lldp neighbors" and cross check with the MAC
address for NIC port from Server defined in the cell configuration.
target: xx-yy-torsw02
targettype: TORSwitch
checkresult:
passed: false
summary: Connection between TorSwitch and Server does not match with the cell
configuration.
detail: |-
Check the cable connection between TorSwitch and Server.
Error:
the BM server port xx-yy-bm15:s1p2 could not be found in the rack. Check if the server xx-yy-bm15 is powered up. If the server is powered up, check th
e cell.yaml file to see if the connection to switch port xx-yy-torsw02:Eth1/10/2 comply with the rack mount
vendorerrorcode: SWITCH_TEST_FAIL(0x01)
errorcode: VAL-E1001
mitigation: If this check failed, it usually means the connection from TorSwitch
to Server does not match the cell configuration. Or the Server has the wrong
MAC address for NIC port in the cell configuration. Check the artifacts directory
or stdout to see which connection flagged.
解決辦法:
請確保已使用 show mac address-table,從 TOR 交換器設定與啟動程式的連線。