batch system and scp/rcp headache

From: Martyn Wheeler (martyn_wheeler_at_myrealbox.com)
Date: 11/07/03


Date: Fri, 7 Nov 2003 21:13:19 -0000

Dear All,

I am trying to setup batch scheduling software (openPBS) on my new linux
cluster and am encountering a few problems with communication
I can get everyting running on the server and compute nodes. I try and
submit a job to the server
qsub testfile

where testfile contains:
#!/bin/sh
#PBS -o jobname.out -j oe -N jobname
#PBS -l cput=500:00:00,ncpus=2
#PBS -l mem=128mw,nodes=1
echo Working directory is $PBS_O_WORKDIR
echo tmp directory is $TMPDIR
echo env is $ENVIRONMENT
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This job runs on the following processors:
echo `cat $PBS_NODEFILE`
echo pbs job id is $PBS_JOBID

test.sh > output

and the file test.sh conatins the line
echo hello world

the job submits but the copying back of the script output gives the me
errors

11/07/2003 17:07:35;0080; pbs_mom;Fil;sys_copy;command: /usr/bin/scp -Br
11.arrakis..OU martynw@arrakis.local:/home/martynw/jobname.out status=1,
try=1
11/07/2003 17:08:06;0080; pbs_mom;Fil;sys_copy;command:
/opt/pbs/sbin/pbs_rcp -r 11.arrakis..OU
martynw@arrakis.local:/home/martynw/jobname.out status=1, try=2
11/07/2003 17:08:17;0080; pbs_mom;Fil;sys_copy;command: /usr/bin/scp -Br
11.arrakis..OU martynw@arrakis.local:/home/martynw/jobname.out status=1,
try=3
11/07/2003 17:08:48;0080; pbs_mom;Fil;sys_copy;command:
/opt/pbs/sbin/pbs_rcp -r 11.arrakis..OU
martynw@arrakis.local:/home/martynw/jobname.out status=1, try=4
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;Unable to copy file
11.arrakis..OU to arrakis.local:/home/martynw/jobname.out
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;connect to address
192.168.0.1: Connection refused
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;Trying 192.168.0.2...
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;connect to address
192.168.0.1: Connection refused
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;Trying 192.168.0.2...
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;connect to address
192.168.0.1: Connection refused
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;Trying 192.168.0.2...
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;connect to address
192.168.0.1: Connection refused
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;Trying 192.168.0.2...
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;connect to address
192.168.0.1: Connection refused
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;Trying 192.168.0.2...
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;connect to address
192.168.0.1: Connection refused
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;Trying 192.168.0.2...
11/07/2003 17:09:09;0004; pbs_mom;Fil;11.arrakis..OU;arrakis.local:
Connection refused

I think it is to do with the way that my network is setup and I need to
setup some hosts files but I am not sure which

On my server the computenodes are in the hosts.equiv file and on the
computenodes the server is in the hosts.equiv file

Does anyone have any ideas as to what might cause these problems
i can provide more details if necessary
Cheers for now Martyn