Joint Astronomy Centre
Show document only
JAC Home
JCMT
UKIRT
Contact info
JAC Divisions
OMP
Outreach
Seminars
Staff-only Wiki
Weather
Web Cameras
____________________

JCMT home
Observing at JCMT
OMP Observation Manager
Telescope
Instrumentation
Schedule
Data Archive
Future Developments
Legacy Surveys
Newsletter & Publications
JAMES CLERK MAXWELL TELESCOPE - IEEE FAULT REPORT

Telescope tracking fault: 4th - 9th February 1999 (HST)


Summary

From start of shift on Thursday 4th February 1999 until end of second shift on the morning of Tuesday 9th February, there was an intermittent tracking problem with the JCMT. While the effect was present, for one second in every ten, the demand to the servo would be delayed by one second. The fault was systematic in nature, in the sense that it would be present or absent for periods of approximately five hours at a time (the precise period is uncertain). Further details are given below.

Details

The JCMT use a number of IEEE busses to allow the telescope control computer (MWTTEL) to communicate with the telescope micros and various pieces of commercial equipment. The original MWTTEL had a Q-bus backplane, and two IEEE interface cards plugged directly into this. When MWTTEL was replaced with a Vaxstation model 90, the IEEE interface cards were replaced by four SCSI-IEEE converters, each one controlling a separate IEEE bus.

At the start of January, one of these devices failed, and was replaced by a spare. Over time, the number of items connected via IEEE has decreased. In order to release the spare again, the devices were rearranged so that only three IEEE busses were required. However, at that date the fourth SCSI-IEEE converter was left connected to the SCSI bus.

On Thursday 4th February HST, on my recommendation, the fourth SCSI-IEEE converter was removed from MWTTEL. However, I did not realise that the IEEE device driver for that device also had to be disabled. The result of not disabling the driver was that every ten seconds the device driver would poll the SCSI ID of the missing converter to determine whether it had come back on line.

During normal operation of the JCMT, the Vax TEL task sends demand encoder values to the Antenna Servo Micro (ASM) once per second. At various times throughout the night, the device driver poll would "collide" with the IEEE transfer to the antenna micro, causing this transfer to take longer than a second to complete. The TEL task would miss out one whole iteration of it's 1Hz loop (so no demand would be generated in the second following the delayed transfer). On the next second, the TEL task would realise that one iteration had been missed, and increment the "missed tick" counter, which is displayed on the status screen.

At the same time, for the second during which the demand was delayed, the ASM would extrapolate the demand from the previous second. Unfortunately, for the next second (for which no real demand had been generated) the ASM would receive and act upon the delayed demand from the previous second. This would effectively cause the telescope to "jump back" the equivalent of one second in time (or fifteen arcseconds in RA for a source at zero declination).

The missed tick problem was first reported on the evening of Sunday 7th February. It is highly likely that it was present at times throughout the previous Thursday through Saturday, but not noticed. The problem was investigated on Monday 8th, and finally diagnosed and cured on Tuesday 9th February. Note that this fault is independent of the transputer faults (which actually started before the converter was disconnected).

The entire processing of the 1Hz loop normally takes approximately 50ms to complete, and is tightly synchronised to the absolute start of each second. Although this is conjecture, it appears that the SCSI poll (with a nominal 10 second timout) was slowly drifting with respect to this. When the two became in synch, the TEL task would regularly miss one "tick" every ten seconds. This would remain the case for approximately five hours. (Unfortunately the precise time at which missed ticks occurs is not automatically logged, and the observers and telescope operators present at the time did not realise the significance of the screen display). After this time, the poll would move out of synch with the IEEE transfer, and the system would start to function completely normally, again for periods of some hours. Certainly there were periods of up to approximately eight hours when the problem would have been seen if present, but no problem was reported.

Effects on the observing programme

Having described the symptoms, we believe that the actual effect this fault will have had on the programmes being undertaken at the time will thankfully be rather small. (This would not have been the case had the programmes been mapping of extended strong sources.)

The most extreme effect is seen in jiggle maps of bright sources. In the absolute worst case, the glitch could occur during the second at which the jiggle map was sampling the central position, but the chances of this happening is rather small. At the other extreme, for a jiggle map made up of many integrations on a blank field (or a very weak source), the only result would be perhaps a very slight degradation of the signal to noise. We believe the effect is much less pronounced in scan map data; the worst case would be perhaps a 10% smearing of the main beam. However, we don't see any real evidence of this looking at PSF fits to the scan map data of calibrators.

For the photometry projects done towards the end of first shift, we believe the effect may be even less noticeable. Most of these sources are very faint, and probably need several hours for even a marginal detection. We have analysed some of this data in several ways, looking for example at the raw sample data (every second of the mini-phot jiggle) and also at the jiggle coadd (9 secs on-source). There are no real differences, and it would be impossible to pinpoint the effect of a 15 arcsec shift (which happens once every 10 secs) given the S/N per integration. For second shift, we believe the same arguments will hold true.

One possible problem, however, is the effect on short calibration data. Again, because the S/N is undoubtably so low on the programme sources, this may not even be so serious. Data taken on subsequent nights, under near-identical conditions, after the fault was corrected, would probably be adequate. Also it is clear that not all the calibrator data was affected by the beam distortion.

I hope this note adequately describes the fault. Please contact me if you would like further information. Many thanks to Wayne and Tim for assistance in writing this report (remaining errors or misconceptions are mine) and to the JAC software and computing services groups for identifying the cause of the problem!

Richard Prestage
12th February 1999

Contact: Remo Tilanus. Updated: Mon Nov 8 14:54:40 HST 2004

Return to top ^