From ALAN at MC.LCS.MIT.EDU Fri Jan 24 08:24:13 1986 From: ALAN at MC.LCS.MIT.EDU (Alan Bawden) Date: Jan 24 86 02:24:13 EST Subject: pifail again In-Reply-To: Msg of Fri 24 Jan 86 01:20:52 EST from Pandora B. Berman Message-ID: <[MC.LCS.MIT.EDU].794321.860124.ALAN> Date: Fri, 24 Jan 86 01:20:52 EST From: Pandora B. Berman Re: pifail again is the name of the latest crash dump, again during an incr. dump. NXM on the Unibus again. This time the victimized instruction: MGWJD1: IORDI T,%TMCS1 ;Get controller status Unfortunately, all of a sudden Penny seems unable to get through an incremental dump without having this (or some Unibus NXM) happen. (Unlike the Chaos net board glitch the other day which only happened once.) Maybe the LH/DH really -does- have something to do with this? Or perhaps someone needs to physically tighten up everything on that Unibus. From CENT5%AI.AI.MIT.EDU at MC.LCS.MIT.EDU Fri Jan 24 07:20:52 1986 From: CENT5%AI.AI.MIT.EDU at MC.LCS.MIT.EDU (Pandora B. Berman) Date: Jan 24 86 01:20:52 EST Subject: pifail again Message-ID: <[AI.AI.MIT.EDU].11649.860124.CENT5> is the name of the latest crash dump, again during an incr. dump. From JTW%MIT-SPEECH at MIT-MC.ARPA Tue Jan 21 00:00:00 1986 From: JTW%MIT-SPEECH at MIT-MC.ARPA (John Wroclawski) Date: Tue 21 Jan 86, 00:00 Subject: ai crash In-Reply-To: Message from "David A. Moon " of Tue 21 Jan 86 14:15:38-EST Message-ID: Does anyone know how long the NXM timeout on a KS10 unibus is? The Chaos board can be slower to respond (3 or 4 microseconds?) when you read it twice in a row, as I recall. I think I remember it being 20 us. Are the KS10 cabinet and the LH/DH cabinet firmly grounded to each other? If not, connecting a Unibus cable between them might cause electrical problems. That's a thought; they're not. I'm inclined to write this off to cosmic rays, and worry about it if it ever happens again after the hardware configuration of that unibus settles down. ------- From Moon at SCRC-STONY-BROOK.ARPA Tue Jan 21 19:51:00 1986 From: Moon at SCRC-STONY-BROOK.ARPA (David A. Moon) Date: Jan 21 86 13:51 EST Subject: ai crash In-Reply-To: <[MC.LCS.MIT.EDU].790256.860120.ALAN> Message-ID: <860121135123.3.MOON@EUPHRATES.SCRC.Symbolics.COM> Date: Mon, 20 Jan 86 23:00:11 EST From: Alan Bawden Date: Mon, 20 Jan 86 22:32:34 EST From: Pandora B. Berman dumped to CRASH;PI-IN PROGRS, which is what the PI lev. 6 bughalt complained about. see log for numbers. it was checking the incr. dump when this happened. after AI came up, the first time i tried to run ICHECK, DUMP got an error; it mentioned RH11 err and maybe MAG TAPE IN DEV. HUNG or something -- see sys log for details. the second time i tried ICHECK, it worked. The immediate cause of the crash was here in the interrupt level Chaos net code: CHSRC5: IORDI B,CAIRBF ;Read out the data, halfwords IORDI C,CAIRBF The second read from CAIRBF got a Non-Existent I/O Register error; just like someone had suddenly unplugged the Chaos board. I have a vague memory that Chaos boards do this sometimes. I hope I'm wrong. Alan and I discussed kludges for making the software resilient to this, but I hope we don't have to resort to them. It would be in the grand tradition of the previous two MIT-AI machines, though. If there were system messages from the magtape code indicating that it was unhappy as well, then perhaps we can conclude that the fault happened somewhere in the "I" Unibus itself. (It would be nice if :SYSMSG worked on a crash dump!) I used to have a program to do this, I think. Better would be to stick SYSMSG inside PEEK and then take advantage of PEEK's existing crash-dump-analysis feature. Perhaps someone who knows more Unibusology than I do can offer an opinion about what might cause this? Remember that this is the Unibus that supports the magtape drive, the DZ11's and the Chaosnet interface. The magtape code was shooting bits back and forth like crazy at the time, presumably that contributed somehow? JTW: Is the LH/DH plugged into this bus right now? Perhaps it did something nasty? JTW: Can you look at the crash dump and figure out whether the magtape RH-11 was supposed to be doing DMA at the time this crash happened? Maybe it and the Chaos board interfere with each other somehow? Does anyone know how long the NXM timeout on a KS10 unibus is? The Chaos board can be slower to respond (3 or 4 microseconds?) when you read it twice in a row, as I recall. Are the KS10 cabinet and the LH/DH cabinet firmly grounded to each other? If not, connecting a Unibus cable between them might cause electrical problems. From JTW%MIT-SPEECH at MIT-MC.ARPA Mon Jan 20 00:00:00 1986 From: JTW%MIT-SPEECH at MIT-MC.ARPA (John Wroclawski) Date: Mon 20 Jan 86, 00:00 Subject: ai crash In-Reply-To: Message from "Alan Bawden " of Mon 20 Jan 86 23:06:09-EST Message-ID: JTW: Is the LH/DH plugged into this bus right now? Perhaps it did something nasty? Yes, and maybe so. Specially since it doesn't seem to work like it is supposed to right at the moment. It would be nice to know the state of the UBA at that point in time... ------- From ALAN at MC.LCS.MIT.EDU Tue Jan 21 05:00:11 1986 From: ALAN at MC.LCS.MIT.EDU (Alan Bawden) Date: Jan 20 86 23:00:11 EST Subject: ai crash In-Reply-To: Msg of Mon 20 Jan 86 22:32:34 EST from Pandora B. Berman Message-ID: <[MC.LCS.MIT.EDU].790256.860120.ALAN> Date: Mon, 20 Jan 86 22:32:34 EST From: Pandora B. Berman dumped to CRASH;PI-IN PROGRS, which is what the PI lev. 6 bughalt complained about. see log for numbers. it was checking the incr. dump when this happened. after AI came up, the first time i tried to run ICHECK, DUMP got an error; it mentioned RH11 err and maybe MAG TAPE IN DEV. HUNG or something -- see sys log for details. the second time i tried ICHECK, it worked. The immediate cause of the crash was here in the interrupt level Chaos net code: CHSRC5: IORDI B,CAIRBF ;Read out the data, halfwords IORDI C,CAIRBF The second read from CAIRBF got a Non-Existent I/O Register error; just like someone had suddenly unplugged the Chaos board. If there were system messages from the magtape code indicating that it was unhappy as well, then perhaps we can conclude that the fault happened somewhere in the "I" Unibus itself. (It would be nice if :SYSMSG worked on a crash dump!) Perhaps someone who knows more Unibusology than I do can offer an opinion about what might cause this? Remember that this is the Unibus that supports the magtape drive, the DZ11's and the Chaosnet interface. The magtape code was shooting bits back and forth like crazy at the time, presumably that contributed somehow? JTW: Is the LH/DH plugged into this bus right now? Perhaps it did something nasty? From CENT%AI.AI.MIT.EDU at MC.LCS.MIT.EDU Tue Jan 21 04:32:34 1986 From: CENT%AI.AI.MIT.EDU at MC.LCS.MIT.EDU (Pandora B. Berman) Date: Jan 20 86 22:32:34 EST Subject: ai crash Message-ID: <[AI.AI.MIT.EDU].11362.860120.CENT> dumped to CRASH;PI-IN PROGRS, which is what the PI lev. 6 bughalt complained about. see log for numbers. it was checking the incr. dump when this happened. after AI came up, the first time i tried to run ICHECK, DUMP got an error; it mentioned RH11 err and maybe MAG TAPE IN DEV. HUNG or something -- see sys log for details. the second time i tried ICHECK, it worked. From TY at XX.LCS.MIT.EDU Fri Jan 17 00:00:00 1986 From: TY at XX.LCS.MIT.EDU (J. J. Tyrone Sealy) Date: Fri 17 Jan 86, 00:00 Subject: tape lossage In-Reply-To: <860117134457.5.MOON@EUPHRATES.SCRC.Symbolics.COM> Message-ID: <12176026589.28.TY@XX.LCS.MIT.EDU> If you can come over. Please do. Unless there is someone else that can fix it. tnx..--TY ------- From ALAN at MC.LCS.MIT.EDU Fri Jan 17 21:51:48 1986 From: ALAN at MC.LCS.MIT.EDU (Alan Bawden) Date: Jan 17 86 15:51:48 EST Subject: tape lossage In-Reply-To: Msg of Fri 17 Jan 86 13:44 EST from David A. Moon Message-ID: <[MC.LCS.MIT.EDU].787522.860117.ALAN> Date: Fri, 17 Jan 86 13:44 EST From: David A. Moon Date: Fri, 17 Jan 86 07:43:58 EST From: "Pandora B. Berman" i wandered over to bring up MC when it crashed, and noticed tape 4000 on the drive. apparently Ty was running a full dump. the dump log contains a note on the MC 4000 line: "Something happened". dave, didn't you say something about 4000 being the max. number for an ITS tape? how does this get fixed? I don't recall saying that. There is a limit on the highest tape number that's controlled by the size of the SYSENG;MACRO TAPES file. The limit can be changed; I don't remember how, but it involves code in DUMP that's commented something like "Don't do this unless you are RJL, and even then be careful." We probably don't need to import RJL to do it. If you need me to come over and figure out the details, ask. This limit must have been reached in the past since the full dump tapes recorded in the database only go back to some time in 1983. But just in case, I figured out the way to change the limit and upped it to 5000. So we probably don't need to worry about this limit on MC ever again... From Moon at SCRC-STONY-BROOK.ARPA Fri Jan 17 19:44:00 1986 From: Moon at SCRC-STONY-BROOK.ARPA (David A. Moon) Date: Jan 17 86 13:44 EST Subject: tape lossage In-Reply-To: <[AI.AI.MIT.EDU].11164.860117.CENT> Message-ID: <860117134457.5.MOON@EUPHRATES.SCRC.Symbolics.COM> Date: Fri, 17 Jan 86 07:43:58 EST From: "Pandora B. Berman" i wandered over to bring up MC when it crashed, and noticed tape 4000 on the drive. apparently Ty was running a full dump. the dump log contains a note on the MC 4000 line: "Something happened". dave, didn't you say something about 4000 being the max. number for an ITS tape? how does this get fixed? I don't recall saying that. There is a limit on the highest tape number that's controlled by the size of the SYSENG;MACRO TAPES file. The limit can be changed; I don't remember how, but it involves code in DUMP that's commented something like "Don't do this unless you are RJL, and even then be careful." We probably don't need to import RJL to do it. If you need me to come over and figure out the details, ask. From CENT%AI.AI.MIT.EDU at MC.LCS.MIT.EDU Fri Jan 17 13:43:58 1986 From: CENT%AI.AI.MIT.EDU at MC.LCS.MIT.EDU (Pandora B. Berman) Date: Jan 17 86 07:43:58 EST Subject: tape lossage Message-ID: <[AI.AI.MIT.EDU].11164.860117.CENT> i wandered over to bring up MC when it crashed, and noticed tape 4000 on the drive. apparently Ty was running a full dump. the dump log contains a note on the MC 4000 line: "Something happened". dave, didn't you say something about 4000 being the max. number for an ITS tape? how does this get fixed? From CPH at MC.LCS.MIT.EDU Sun Jan 12 00:25:45 1986 From: CPH at MC.LCS.MIT.EDU (Chris Hanson) Date: Jan 11 86 18:25:45 EST Subject: Losing Dialup Message-ID: <[MC.LCS.MIT.EDU].780740.860111.CPH> When I dial x6985, I am getting a connection which responds to my carriage return with the standard "Connected to MC.", but then it fails to give me a HACTRN. C-Z has no effect. I notice that *nobody* is logged in from a dialup. This seems like it might be related. From JSOL%BUCS20%bostonu.csnet at CSNET-RELAY.ARPA Sun Jan 5 23:09:00 1986 From: JSOL%BUCS20%bostonu.csnet at CSNET-RELAY.ARPA (Jon Solomon) Date: Jan 5 1986 17:09 EST Subject: [JSOL: TELECOM] Consider this a warning. In-Reply-To: Msg of 5 Jan 1986 15:26-EST from Alan Bawden Message-ID: <[BUCS20].JSOL. 5-Jan-86 17:09:02> Okay, now I know the intended audience for my message. One fact that I forgot to mention in the other message was that this JUST started happening about a week ago. Whoever is hacking COMSAT, please take note. Thanks, --JSol From ALAN at MC.LCS.MIT.EDU Sun Jan 5 21:26:51 1986 From: ALAN at MC.LCS.MIT.EDU (Alan Bawden) Date: Sun, 5 Jan 86 15:26:51 EST Subject: [JSOL: TELECOM] Consider this a warning. Message-ID: <[MC.LCS.MIT.EDU].773515.860105.ALAN> MSG: *MSG 4866 Date: 01/05/86 13:22:00 From: JSOL at XX.LCS.MIT.EDU To: *BBOARD at XX.LCS.MIT.EDU Re: TELECOM Received: from XX.LCS.MIT.EDU by MC.LCS.MIT.EDU 5 Jan 86 13:21:49 EST Date: Sun 5 Jan 86 13:24:23-EST From: Jon Solomon Subject: TELECOM To: BBOARD at MC.LCS.MIT.EDU Message-ID: <12172857686.19.JSOL at XX.LCS.MIT.EDU> Due to the installation of a new mail system, I can no longer ship off TELECOM to MC. Since there are quite a large number of MC users on TELECOM, and considering the fact that this restriction might affect other digests, I am sending this message to your bulletin board rather than individually to MC users. I -believe- what he is refering to is the fact that digests tend to be large enough that they exceed COMSAT's pitifully small size limitation. I note that CSTACY claimed the lock for hacking COMSAT two weeks ago, hacked on it for an evening, and hasn't logged in since then. There are now about 130 BADREQ files on .MAIL2, many of them 2 weeks old. (I'm going to have to create .MAIL3 soon...) Warning: If the day ever comes that I feel that it is Up-To-Me- To-Do-Something about COMSAT (because of address space problems, lack of proper domain support, or whatever) I will simply advise everybody that ITS no longer supports mail for users or mail forwarding for the network and I will shut it off. I feel this day -rapidly- approaching. I don't see any competent programmers making the kind of necessary effort it is going to take to straighten out this mess. I am forwarding this message to a large audience in the hopes that somebody will get inspired, but I realize this is grasping at straws. From JNC at MC.LCS.MIT.EDU Fri Jan 3 06:33:24 1986 From: JNC at MC.LCS.MIT.EDU (J. Noel Chiappa) Date: Fri, 3 Jan 86 00:33:24 EST Subject: Static routes in MC's routing table Message-ID: <[MC.LCS.MIT.EDU].771596.860103.JNC> There are way too many (old) static routes in the table. They don't seem to get updated correctly; data for SCRC was going through the (loaded) MIT-GW instead of the (idle) MIT-AI-GW, although the rest of the Internet got the change months ago. Someone should delete all but the necessary ones. I deleted the SCRC one and patched it out in the running system, with the result that it instantly picked up the right one. Also, the ICMP Redirect code is not the best possible in that it does no handle per-Host Redirects well; it folds them all into the single net entry. When we start attaching ITSen to subnetted nets, this will lose big; traffic to different subnets will thrash the cache line. For that matter, the ITS IP layer doesn't use the correct 'mask' algorithm for dealing with host addresses (per RFC940, etc).