[Slackbuilds-users] openmpi request
Karel Venken
k.venken at online.be
Thu Jul 25 15:54:25 UTC 2019
Emmanuel wrote:
>
>
> On Thu, Jul 25, 2019 at 5:04 AM Robby Workman
> <rworkman at slackbuilds.org <mailto:rworkman at slackbuilds.org>> wrote:
>
> On Thu, 25 Jul 2019 09:58:03 +0200
> Karel Venken <kava0418 at online.be <mailto:kava0418 at online.be>> wrote:
>
> > Hi,
> >
> > For installing our cluster we need to add to the
> openmpi.Slackbuilds
> > with --with-pmi=pmi2 configure option. So it becomes:
> >
> > ./configure \
> > --prefix=/usr \
> > --sysconfdir=/etc \
> > --localstatedir=/var/lib \
> > --mandir=/usr/man/ \
> > --enable-mpi1-compability \
> > --docdir=/usr/doc/$PRGNAM-$VERSION \
> > --disable-static \
> > --libdir=/usr/lib${LIBDIRSUFFIX} \
> > --build=$ARCH-slackware-linux \
> > --with-pmi=pmi2
> >
> >
> > The background is to use mpi with slurm and a NUMA kernel - we build
> > it ourself. Without this parameter openmpi crashes. Would this be an
> > option?
>
>
> CCing SBo maintainer of openmpi; if there's no response and/or an
> update with that fixed within a few weeks, follow up with us and
> we'll handle it directly.
>
> -RW
>
>
> Hi Karel,
>
> I'm the maintainer of openmpi and slurm, let me try this parameter in
> my cluster because we haven't had issues with the current package and
> slurm (and also with several versions of openmpi, 1.8.x, 1.10.x,
> 2.1.1). Can you send me the exact error? Have you modified the slurm
> build script to add --with-pmi? are you running mpirun in the slurm
> submit job script or srun?
>
> In any case, I will submit a new version of the script in the next few
> days.
>
Hi Emmanuel,
Thanks for answering so soon. I added optional dependencies numactl
hwloc and rrdtool to slurm and of course for building I set the
environment with HWLOC=yes RRDTOOL=yes
(We also integrate slurm with ganglia, but that's besides the point
here, just to mention we activated rrdtool there as well)
The error was produced by one of our applications warning about numa and
then crashing/hanging at the mpi request. Everything then worked fine
when we changed this compilation. (I have had a discussion in the
slackware newsgroup about NUMA)
I am sorry that I didn't keep the log of the application.
FWW, to allow this application to use memory shared over different nodes
we also had to recompile the kernel with NUMA option enabled (the stock
kernel has it turned off, but, if I am correct, the current version has
it activated)
If this goes beyond what you can/want to investigate, that's OK. I am
already thankful you want to give it a look. Anf, of course, if it is a
problem in version 14.2, we 'll pick it up again if needed when a new
version arrives.
kind regards,
Karel.
More information about the SlackBuilds-users
mailing list