[Slackbuilds-users] openmpi request

Karel Venken k.venken at online.be
Thu Jul 25 15:54:25 UTC 2019


Emmanuel wrote:
>
>
> On Thu, Jul 25, 2019 at 5:04 AM Robby Workman 
> <rworkman at slackbuilds.org <mailto:rworkman at slackbuilds.org>> wrote:
>
>     On Thu, 25 Jul 2019 09:58:03 +0200
>     Karel Venken <kava0418 at online.be <mailto:kava0418 at online.be>> wrote:
>
>     > Hi,
>     >
>     > For installing our cluster we need to add to the
>     openmpi.Slackbuilds
>     > with --with-pmi=pmi2 configure option. So it becomes:
>     >
>     > ./configure \
>     >    --prefix=/usr \
>     >    --sysconfdir=/etc \
>     >    --localstatedir=/var/lib \
>     >    --mandir=/usr/man/ \
>     >    --enable-mpi1-compability \
>     >    --docdir=/usr/doc/$PRGNAM-$VERSION \
>     >    --disable-static \
>     >    --libdir=/usr/lib${LIBDIRSUFFIX} \
>     >    --build=$ARCH-slackware-linux \
>     >    --with-pmi=pmi2
>     >
>     >
>     > The background is to use mpi with slurm and a NUMA kernel - we build
>     > it ourself. Without this parameter openmpi crashes. Would this be an
>     > option?
>
>
>     CCing SBo maintainer of openmpi; if there's no response and/or an
>     update with that fixed within a few weeks, follow up with us and
>     we'll handle it directly.
>
>     -RW
>
>
> Hi Karel,
>
> I'm the maintainer of openmpi and slurm, let me try this parameter in 
> my cluster because we haven't had issues with the current package and 
> slurm (and also with several versions of openmpi, 1.8.x, 1.10.x, 
> 2.1.1). Can you send me the exact error? Have you modified the slurm 
> build script to add --with-pmi?  are you running mpirun in the slurm 
> submit job script or srun?
>
> In any case, I will submit a new version of the script in the next few 
> days.
>

Hi Emmanuel,

Thanks for answering so soon. I added optional dependencies numactl 
hwloc and rrdtool to slurm and of course for building I set the 
environment with HWLOC=yes RRDTOOL=yes

(We also integrate slurm with ganglia, but that's besides the point 
here, just to mention we activated rrdtool there as well)

The error was produced by one of our applications warning about numa and 
then crashing/hanging at the mpi request. Everything then worked fine 
when we changed this compilation. (I have had a discussion in the 
slackware newsgroup about NUMA)

I am sorry that I didn't keep the log of the application.

FWW, to allow this application to use memory shared over different nodes 
we also had to recompile the kernel with NUMA option enabled (the stock 
kernel has it turned off, but, if I am correct, the current version has 
it activated)

If this goes beyond what you can/want to investigate, that's OK. I am 
already thankful you want to give it a look. Anf, of course, if it is a 
problem in version 14.2, we 'll pick it up again if needed when a new 
version arrives.

kind regards,

Karel.









More information about the SlackBuilds-users mailing list