Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

## NEC の OpenPOWER への取り組み ##### 基盤ソフト屋から見える OpenPOWER 界隈の風景 2018.09.03 Miyamoto Kazuyuki ----- ## working for HA clustering software EXPRESSCLUSTER ( CLUSTERPRO 1996~ ) ----- # Computer Cluster - HPC - DEC VAXcluster (1984) - Beowulf(1994), MPI (1994), SCore (2001) - HA - Tandem Himaraya (1994) - IBM S/390 Parallel Sysplex (1994) ----- # HA Cluster - Lord Balance - ipvsadm, lvs - Failover - Hot standby - Stratus everRun - VMware FT - Cold standby - Micro Focus PlateSpin - **Warm standby** ----- ## HA Cluster Software (commercial) - UNIX - IBM PowerHA (HACMP 1991) - HP Serviceguard (1990) - Oracle Solaris Cluster (Sun Cluster 2.0 1997) - IA - Veritas Cluster Server - Microsoft WSFC (WinNT4.0 MSCS 1996) - SIOS LifeKeeper (1999) - NEC CLUSTERRPO (1996) ----- ## Configuration types of HA Cluster ![Shared disk type](https://jpn.nec.com/clusterpro/clp/img/function_6.png) ----- ## Configuration types of HA Cluster ![block device replication type](https://jpn.nec.com/clusterpro/clp/img/function_7.png) ----- ## Configuration types of HA Cluster ![block device replication type](https://jpn.nec.com/clusterpro/clp/img/function_10.png) ----- ## How I involved with **POWER** Around **2007-2008**, platform expansion movement upon codebase of CLUSTERPRO for Linux ----- #### HP Existing MC/Service Guard - pro : foo - con : bar ----- #### Sun (Oracle) Existing Sun Cluster - pro : 政治的な問題が無い。[foo hoge bar](SIに要認定技術者で複雑) - con : Machine neck for sparc dev - pro : 2008.05 OpenSolaris initial release - con : 2010.08.13 「OpenSolais is officially now dead.」事件 resl : CLUSTERPRO X for Solaris (x86) ----- #### IBM Existing PowerHA - pro: 政治的な問題が無い - pro: IBM の協力があり machine neck も無かった resl : CLUSTERPRO for *Linux on POWER* (CLUSTERPRO for *Linux on z*) ----- ## How I involved with **OpenPOWER** ### ExpEther started in 2012 ![ExpEther](http://www.expether.org/images/ee_basic.png?crc=309300388) ----- ## Problem on ExpEther guy on 2014 SPoF ![SPoF](http://expether.org/images/multihost.png?crc=4252991000)

solution

graph TD subgraph CLP EEM1(EEM
active) end subgraph CLP EEM2(EEM
standby) end subgraph Xbox NVMe end SW[40G SW] style EEM2 fill:#ccf,stroke:#f66,stroke-width:2px,stroke-dasharray: 5, 5 EEM1 --- SW EEM2 --- SW Other[Other Servers] --- SW SW --- NVMe

solution

graph TD subgraph CLP EEM1(EEM
active) end subgraph CLP EEM2(EEM
standby) end subgraph Xbox NVMe end %%Other[Other Servers] SW1[40G SW] SW2[40G SW] style EEM2 fill:#ccf,stroke:#f66,stroke-width:2px,stroke-dasharray: 5, 5 EEM1 --- SW1 EEM2 --- SW1 EEM1 --- SW2 EEM2 --- SW2 %%Other --- SW1 %%Other --- SW2 SW1 --- NVMe SW2 --- NVMe

Approach by HA Cluster

graph TD subgraph Xbox NVM((NVMe SSD)) GPU1(GPU Card) GPU2(GPU Card) end Node-E --- NVM Node-F --- NVM Node-G --- GPU1 %% Node-G --- GPU2 Node-H --- GPU1 %% Node-H --- GPU2
# Reality of the OP world ### 狙っている (と表明されている) モノ - High Performance Computing - Distributed Computing (AI) - GPU, NVLink - CAPI, PCIe Gen4 - Peripheral Device - ODM屋 は嬉しい ? ----- ## Who use How - High Performance Computing - Univ, Manufacturing, Drug discovery ... - Distributed Computing - OpenPOWER のメジャーユーザーって Google, Tencent ? - 必要な HW, OS, APP, Algo を自作しそう - Deep Learning - performance, scalability だけが意味のある領域 - Erasure coding ? - Density (threads per socket) が意味を持つ領域 ? ----- ## Reality of the OP world ### 眼中に無い (ように見える) モノ - Personal - Arm との勝負 ? - Line of Business - Traditional enterprise (DBMS, APS, FS, Batch) - *SAP HANA* might be the exception - IA との勝負? - Linux 以外の選択肢 ----- ## An idea came up along w/ new movement GPU computing power as a service - amazon, MS, IBM, Oracle, Fujitsu ... GPU, an asset to be protected - Failover GPU in Xbox - GPU Hotplug is the matter - Driver support - eGPU ----- ## Funny some, where is ? ### Finding playground in the areas of HPC, GPU computing 小 ARM、中 Xeon、大 POWER ??? OPに関わる楽しいナニカ ??? else ???