Skip to content
This repository has been archived by the owner on Jan 24, 2023. It is now read-only.

Build OPAE for PACN300 Worker image failing #94

Open
ravicorning opened this issue Mar 9, 2021 · 2 comments
Open

Build OPAE for PACN300 Worker image failing #94

ravicorning opened this issue Mar 9, 2021 · 2 comments

Comments

@ravicorning
Copy link

Trying to get the FPGA virtualization enabled using the flexran flavor, the host has a the N3000 accelaration card.
I copied the OPAE_SDK_1.3.7-5_el7.zip file in the appropriate Dir and see it getting copied to the worker node. These are the last script logs:

Any idea why it might be failing ?. I was having issues replacing my worker node's 3.10.0-1127.el7.x86_64 to a RT one, so disabled them in the group_vars/* scripts..

2021-03-08 17:54:17,161 p=36551 u=ravi n=ansible | TASK [opae_fpga/node : build OPAE for PACN3000 FPGA worker image] **************************************************************************************************************************************************
2021-03-08 17:58:34,668 p=36551 u=ravi n=ansible | fatal: [node01]: FAILED! => {
"changed": true,
"cmd": [
"make",
"fpga-opae"
],
"delta": "0:04:17.269720",
"end": "2021-03-09 13:57:54.851411",
"rc": 2,
"start": "2021-03-09 13:53:37.581691"
}

STDOUT:

docker build -t fpga-opae-pacn3000:1.0 -f ./dist/fpga_opae/Dockerfile ./dist/fpga_opae/
Sending build context to Docker daemon 8.372MB
Step 1/28 : FROM centos:7.8.2003
---> afb6fca791e0
Step 2/28 : WORKDIR /root/opae
---> Using cache
---> a39330ecc30e
Step 3/28 : ENV http_proxy=$http_proxy
---> Using cache
---> 651374cefcc8
Step 4/28 : ENV https_proxy=$https_proxy
---> Using cache
---> 880f0867908d
Step 5/28 : RUN yum install -y gcc gcc-c++ cmake make autoconf automake libxml2 libxml2-devel json-c-devel boost ncurses ncurses-devel ncurses-libs boost-devel libuuid libuuid-devel python2-jsonschema doxygen hwloc-devel libpng12 rsync bc python-devel python-libs python-sphinx unzip which wget python36 epel-release sudo
---> Using cache
---> f1ee31287176
Step 6/28 : RUN easy_install pip==20.3.3 && pip install intelhex
---> Using cache
---> f55f0cc8aca3
Step 7/28 : RUN wget http://linuxsoft.cern.ch/cern/centos/7/rt/CentOS-RT.repo -O /etc/yum.repos.d/CentOS-RT.repo
---> Using cache
---> 4eb96bf07ac0
Step 8/28 : RUN wget http://linuxsoft.cern.ch/cern/centos/7/os/x86_64/RPM-GPG-KEY-cern -O /etc/pki/rpm-gpg/RPM-GPG-KEY-cern
---> Using cache
---> a5f5096b203a
Step 9/28 : RUN export isRT=$(uname -r | grep rt -c) && if [ $isRT = "1" ] ; then yum install -y "kernel-rt-devel-uname-r == $(uname -r)"; else yum install -y "kernel-devel-uname-r == $(uname -r)"; fi
---> Running in a3d94b53d570
Loaded plugins: fastestmirror, ovl
Loading mirror speeds from cached hostfile

  • base: mirror.keystealth.org
  • epel: mirror.sfo12.us.leaseweb.net
  • extras: centos-distro.cavecreek.net
  • updates: mirrors.xtom.com
    No package kernel-devel-uname-r == 3.10.0-1127.el7.x86_64 available.
    Error: Nothing to do

STDERR:

The command '/bin/sh -c export isRT=$(uname -r | grep rt -c) && if [ $isRT = "1" ] ; then yum install -y "kernel-rt-devel-uname-r == $(uname -r)"; else yum install -y "kernel-devel-uname-r == $(uname -r)"; fi' returned a non-zero code: 1
make: *** [fpga-opae] Error 1

MSG:

non-zero return code

2021-03-08 17:58:34,672 p=36551 u=ravi n=ansible | PLAY RECAP *********************************************************************************************************************************************************************************************************
2021-03-08 17:58:34,673 p=36551 u=ravi n=ansible | node01 : ok=122 changed=26 unreachable=0 failed=1 skipped=130 rescued=0 ignored=2

@aniket-intel
Copy link

Hi Ravi,

Could you please elaborate on the issue you were facing while upgrading the kernel of your worker node to RT, and if possible send the logs for the same?

@ravicorning
Copy link
Author

For now I'm setting the Kernel_skip = false, for both controller and the worker node. The issue is that when the installation was booting the worker node with the new RT kerne, but with the edge node the server wouldn't come up, it was setting it to a bad state.
But, I guess, RT kernel is not a requisite right, to test the basic SR-IOV and FPGA virtualization features ?

The worker node is on 3.10.0-1127.el7.x86_64. I was able to bypass the check stuck in that message log above and the installation went fine. But now cannot update the fpga using the 'kubect rsu..' update commands...guess the fpga_opae container is not running in the worker node..

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants