It is always a good day when I drive up the mountain to a broken telescope, then drive down leaving a working telescope. Easy to say, not always easy to accomplish, the simple statement obscuring a day of struggle to solve the problem and fix it.
Such a day was Monday.
The Keck 2 telescope drive is a complex beast of dozens of relays, miles of cabling, servo amplifiers and power supplies, plus several circuit boards designed and built in the 1980’s holding a bewildering array of arcane logic.
The complexity is not helped at all by the new drive system layered on top, a modern replacement for the decades old circuitry, both systems being operational at the same time to avoid downtime during the transition. The new TCSU system adds a monstrous spaghetti mass of wire that allows the two systems to be switched out given dozens more relays. Simply opening the cabinet could make an engineer run screaming from the mountain.
I feared that the issue with the telescope was my fault. Saturday I had installed a new version of code into the local controls PLC, a controller tasked with critical telescope safety interlocks. The failure of this new code to operate would indeed stop the telescope from operating correctly. I had spent a full day testing this new code before release, plus half of the next day with follow up testing. I was pretty sure that my code was operating properly. Pretty sure, but not certain, the timing was all too suspicious.
It is Sunday that everything went off in the proverbial handbasket. It is Sunday afternoon that the phone calls began, things were not good. We managed to struggle through the night, but it took a lot of effort by several people to get the telescope moving, doing what we could over the remote control links. It was not long into the afternoon that I realized I would be spending my Monday on the summit, this would take some hands-on troubleshooting.
We had failure one fixed before lunch. Tracing the lack of power to the brake contactors back through the circuitry led us to CR50, a relay that enabled power to the contactors. We found that the relay was not operating, no power to the coil. Upon finding a blown fuse we thought we had the solution, but replacing the fuse simply smoked the relay, burning out the coil. Replacing both the fuse and the relay fixed the issue, we could now move the telescope in manual mode.
Good, we fixed it.
Sure about that?
More testing after lunch. We had DCS mode working, the old telescope drive system. The new system, TCSU refused to operate the telescope manually, drive power would not enable. With a few phone calls to the engineer who had designed much of the TCSU system we tracked down the basic issue, the command was simply not getting from the manual control panel to the PLC, a wiring issue.
We knew that the problem was somewhere in the relay box that allowed one system to be switched to the other, and that is where we stopped, it was time to turn the telescope over to the support techs to release for the night. We would be operating using the old system thus a failure in the new system was something we could fix later.
Feeling like we had the issue under control I sat in the engineering office writing up some maintenance logs on what we had done.
Then another radio call… We have a new problem.
The azimuth drive would power up and release the brakes without a command to do so. This is a very troubling thing, this could be seriously unsafe if the telescope were to move unexpectedly!
Now near the end of our work day I started again to troubleshoot a new problem. It took a bit over an hour, tracing the problem to a particular driver chip on the logic board. It looked like it had a shorted output, asserting the enable command without anyone actually pressing the button. Replace the chip and we are back in business, problem solved, about a quarter to 5pm and our scheduled departure. We scramble to finish releasing the telescope, performing the final checks for the night.
At least my work on the local controls interlock system was not to blame, a bit of a relief. We had three problems with two solved and one at least narrowed to a particular set of cabling and one relay. Good solutions as well, defects that matched the symptoms, with one issue solved by finding a literal smoking gun relay.
Even more satisfying, the telescope operates correctly through the night, not a hint of trouble from the drives. Reading the night-logs the next morning is a relief, I could celebrate a little, at least for now, it will break again… Someday, but not today.
So? Now I have three separate failures in the telescope drive hardware, all occurring over the same few days. There is no commonality in these failures, no single event I can think of that could cause these particular failures. Did we accidentally damage the driver IC while tracing the first problem? I do not believe so, we were working on different circuits. It is mysteries like these that have engineers believing in gremlins, or the menehunes of island lore.