JAMES CLERK MAXWELL TELESCOPE - IEEE FAULT REPORT
Telescope tracking fault: 4th - 9th February 1999 (HST) |
|
Summary
From start of shift on Thursday 4th February 1999 until end of second
shift on the morning of Tuesday 9th February, there was an intermittent
tracking problem with the JCMT. While the effect was present, for
one second in every ten, the demand to the servo would be delayed by
one second. The fault was systematic in nature, in the sense that it
would be present or absent for periods of approximately five hours
at a time (the precise period is uncertain). Further details are given below.
Details
The JCMT use a number of IEEE busses to allow the telescope control
computer (MWTTEL) to communicate with the telescope micros and various
pieces of commercial equipment. The original MWTTEL had a Q-bus
backplane, and two IEEE interface cards plugged directly into
this. When MWTTEL was replaced with a Vaxstation model 90, the IEEE
interface cards were replaced by four SCSI-IEEE converters, each one
controlling a separate IEEE bus.
At the start of January, one of these devices failed, and was replaced
by a spare. Over time, the number of items connected via IEEE has
decreased. In order to release the spare again, the devices were rearranged
so that only three IEEE busses were required. However, at that date
the fourth SCSI-IEEE converter was left connected to the SCSI bus.
On Thursday 4th February HST, on my recommendation, the fourth
SCSI-IEEE converter was removed from MWTTEL. However, I did not
realise that the IEEE device driver for that device also had to be
disabled. The result of not disabling the driver was that every ten seconds
the device driver would poll the SCSI ID of the missing converter to
determine whether it had come back on line.
During normal operation of the JCMT, the Vax TEL task sends demand
encoder values to the Antenna Servo Micro (ASM) once per second. At
various times throughout the night, the device driver poll would
"collide" with the IEEE transfer to the antenna micro, causing this
transfer to take longer than a second to complete. The TEL task would
miss out one whole iteration of it's 1Hz loop (so no demand would
be generated in the second following the delayed transfer). On the
next second, the TEL task would realise that one iteration had been
missed, and increment the "missed tick" counter, which is displayed on
the status screen.
At the same time, for the second during which the demand was delayed,
the ASM would extrapolate the demand from the previous second. Unfortunately,
for the next second (for which no real demand had been generated) the
ASM would receive and act upon the delayed demand from the previous
second. This would effectively cause the telescope to "jump back"
the equivalent of one second in time (or fifteen arcseconds in RA for
a source at zero declination).
The missed tick problem was first reported on the evening of Sunday
7th February. It is highly likely that it was present at times
throughout the previous Thursday through Saturday, but not
noticed. The problem was investigated on Monday 8th, and finally
diagnosed and cured on Tuesday 9th February. Note that this fault
is independent of the transputer faults (which actually started
before the converter was disconnected).
The entire processing of the 1Hz loop normally takes approximately
50ms to complete, and is tightly synchronised to the absolute start of
each second. Although this is conjecture, it appears that the SCSI
poll (with a nominal 10 second timout) was slowly drifting with
respect to this. When the two became in synch, the TEL task would
regularly miss one "tick" every ten seconds. This would remain the
case for approximately five hours. (Unfortunately the precise time at
which missed ticks occurs is not automatically logged, and the
observers and telescope operators present at the time did not realise
the significance of the screen display). After this time, the poll
would move out of synch with the IEEE transfer, and the system would
start to function completely normally, again for periods of some
hours. Certainly there were periods of up to approximately eight hours
when the problem would have been seen if present, but no problem was
reported.
Effects on the observing programme
Having described the symptoms, we believe that the actual effect this fault
will have had on the programmes being undertaken at the time will thankfully
be rather small. (This would not have been the case had the programmes
been mapping of extended strong sources.)
The most extreme effect is seen in
jiggle maps
of bright sources. In the absolute worst case, the glitch could occur
during the second at which the jiggle map was sampling the central
position, but the chances of this happening is rather small. At the
other extreme, for a jiggle map made up of many integrations on a
blank field (or a very weak source), the only result would be perhaps
a very slight degradation of the signal to noise. We believe the
effect is much less pronounced in scan map data; the worst case would
be perhaps a 10% smearing of the main beam. However, we don't see any
real evidence of this looking at PSF fits to the scan map data of
calibrators.
For the photometry projects done towards the end of first shift, we
believe the effect may be even less noticeable. Most of these sources
are very faint, and probably need several hours for even a marginal
detection. We have analysed some of this data in several ways,
looking for example at the raw sample
data (every second of the mini-phot jiggle) and also at the jiggle coadd
(9 secs on-source). There are no real differences, and it would be
impossible to pinpoint the effect of a 15 arcsec shift (which happens once
every 10 secs) given the S/N per integration. For second shift, we
believe the same arguments will hold true.
One possible problem, however, is the effect on short calibration data.
Again, because the S/N is undoubtably so low on the programme sources,
this may not even be so serious. Data taken on subsequent nights, under
near-identical conditions, after the fault was corrected, would
probably be adequate. Also it is clear that not all the calibrator data
was affected by the beam distortion.
I hope this note adequately describes the fault. Please contact me
if you would like further information. Many thanks to Wayne and Tim
for assistance in writing this report (remaining errors or misconceptions
are mine) and to the JAC software and computing services groups for
identifying the cause of the problem!
Richard Prestage
12th February 1999
|