From DCP at SCRC-QUABBIN.ARPA Wed Jul 31 18:45:00 1985 From: DCP at SCRC-QUABBIN.ARPA (David C. Plummer in disguise) Date: Jul 31 85 12:45 EDT Subject: It's definite - TOPS-20 loses! In-Reply-To: The message of 28 Jul 85 07:57-EDT from Ken Harrenstien Message-ID: <850731124518.9.NFEP@NEPONSET.SCRC.Symbolics.COM> Date: Sun 28 Jul 85 04:57:33-PDT From: Ken Harrenstien I was able to log both ends of a telnet connection (using TOPS-20 TN here, and the ITS datagram logger on MC) and captured an instance of the lossage. MC is sending a repacketized segment which TOPS-20 incorrectly treats as additional data. That is, MC's TCP sends off three separate segments, and then when no ACK is received it decides to retransmit, but is clever and lumps all the data together into a single segment which it then retransmits. This new segment has the proper sequence number (same as seq # of the original 1st segment). The data in this segment is exactly that data which is duplicated in TN's log file. The only possible explanation is that the TOPS-20 monitor's TCP code has a long-standing bug in it. It now occurs to me that I have seen this before when TN'ing to a VAX 4.2BSD system. I always blamed the BSD code for this, and other people claimed this was due to a bug in 4.2 server telnet, but somehow it never seemed to go away. Looks as if this was actually a TOPS-20 bug! I will pursue this with the appropriate people. Many too many implementations have the "Doesn't correctly handle overlapping segments" bug. The trickiest one is receiving a segment some of whose data has already been acknowledged and some whose is new. *sigh* From CSTACY at MIT-MC.ARPA Wed Jul 31 10:20:51 1985 From: CSTACY at MIT-MC.ARPA (Christopher C. Stacy) Date: Jul 31 85 04:20:51 EDT Subject: response to old bug report of mine: TS3TTY absolved Message-ID: <[MIT-MC.ARPA].594867.850731.CSTACY> Some time back I had reported a problem where when I first connected to MC, the first few characters of each line appeared to be garbage. This only happenned on dialup lines and would persist until I set the terminal type. It really looked like ITS had initialized my TTY to some wrong type -- the illegible graphics symbols appearing on my AAA screen were likely-looiking control codes or padding. I didn't want to believe that the TTY code was broken, and some other people asserted that my problem was not possibly the fault of ITS. Well, the other day I mentioned to my roomate that I was going to go take a look at TS3TTY and convince myself it could not possibly be broken. He responded by pouring over the AAA terminal documentation and frobbing the terminal, and finally located the real culprit. Apparently the AAA has this feature (which we had enabled) where you can turn some bit on if you desire the first few characters of each line to be trashed for you. So, this message is just a response for the record about that old bug report I filed. The ITS TTY code was *not* at fault. From KLH at SRI-NIC.ARPA Sun Jul 28 00:00:00 1985 From: KLH at SRI-NIC.ARPA (Ken Harrenstien) Date: Sun 28 Jul 85, 00:00 Subject: It's definite - TOPS-20 loses! Message-ID: I was able to log both ends of a telnet connection (using TOPS-20 TN here, and the ITS datagram logger on MC) and captured an instance of the lossage. MC is sending a repacketized segment which TOPS-20 incorrectly treats as additional data. That is, MC's TCP sends off three separate segments, and then when no ACK is received it decides to retransmit, but is clever and lumps all the data together into a single segment which it then retransmits. This new segment has the proper sequence number (same as seq # of the original 1st segment). The data in this segment is exactly that data which is duplicated in TN's log file. The only possible explanation is that the TOPS-20 monitor's TCP code has a long-standing bug in it. It now occurs to me that I have seen this before when TN'ing to a VAX 4.2BSD system. I always blamed the BSD code for this, and other people claimed this was due to a bug in 4.2 server telnet, but somehow it never seemed to go away. Looks as if this was actually a TOPS-20 bug! I will pursue this with the appropriate people. ------- From KLH at MIT-MC.ARPA Sun Jul 28 13:07:01 1985 From: KLH at MIT-MC.ARPA (Ken Harrenstien) Date: Jul 28 85 07:07:01 EDT Subject: RMAIL display problem - yet more data Message-ID: <[MIT-MC.ARPA].591010.850728.KLH> I was finally able to provoke this bug simply with DDT ^R typeout of a file -- conclusive proof that the problem does not lie with TECO, EMACS, or RMAIL. It is not reproducible however (doing another typeout will not lose in the same place, if at all). From KLH at MIT-MC.ARPA Sun Jul 28 12:58:50 1985 From: KLH at MIT-MC.ARPA (Ken Harrenstien) Date: Jul 28 85 06:58:50 EDT Subject: RMAIL display problem - yet more data Message-ID: <[MIT-MC.ARPA].590992.850728.KLH> Well, with considerable pain I was able to cause an example of this lossage while keeping a TCP datagram log. However, the log doesn't show what I expected; I was looking for the stretch of duplicated text that I observed, and couldn't find it. There are some retransmissions but they are all correct. Until GSB commented on the fact that the extra stuff seemed to be a duplicate of previous stuff, I hadn't noticed this attribute, but since then I've checked every instance and this appears to be always true. Something somewhere is being retransmitted or re-used. Since this happens with both CTN (CRTSTY SUPDUP) and TOPS-20 TN, it isn't a TOPS-20 user-program problem. Since the outgoing datagram log on MC shows no problems, the obvious deduction is that this looks like a TOPS-20 monitor problem. As it happens, the duplicated stuff does appear to correspond to a re-packetized TCP segment. More tests will be necessary to confirm this, however. This also implies that GSB's problem is actually something different from this one. Since he mentioned it happening with PEEK, I think we should confine further discussion to BUG-ITS and leave out BUG-RMAIL,TECO,EMACS unless more information turns up. From ALAN at MIT-MC.ARPA Sat Jul 27 04:10:20 1985 From: ALAN at MIT-MC.ARPA (Alan Bawden) Date: Jul 26 85 22:10:20 EDT Subject: 7LP: and 7LR: Message-ID: <[MIT-MC.ARPA].589842.850726.ALAN> Remember the 7LP: device I advertised in this spot last winter? (I sends output to the LN01 printer on the 7th floor.) Well, I have just installed a 7LR: device for sending output to the new laserwriter (also on the 7th floor). While I was at it I gave both devices a new feature. They now support deletion so you can delete items from the queue. For example, if 7LP^F shows you the following: 7th floor ln01 is ready and printing Time Owner Job Files Size *21:55 alan 905 7LP: BAWDEN; B 249 49481 The most recent job printed was: 21:21 alan 7LP: BAWDEN; .FILE. (DIR) then you can delete job 905 by doing either ^O 7LP:905 or ^O 7LP:ALAN. In the later case all entries owned by ALAN are deleted. The second filename and directory are ignored. From ALAN at MIT-MC.ARPA Thu Jul 25 22:39:03 1985 From: ALAN at MIT-MC.ARPA (Alan Bawden) Date: Jul 25 85 16:39:03 EDT Subject: Hardware Message-ID: <[MIT-MC.ARPA].587801.850725.ALAN> The parity errors MC was getting today seem to be closely correlated to when the tape drive was being used. I don't know when DEC is coming back to fix the first memory box, but perhaps some diagnostics should be run to see if the tape drive and it's DF10 are functioning properly. From ALAN at MIT-MC.ARPA Thu Jul 25 00:06:43 1985 From: ALAN at MIT-MC.ARPA (Alan Bawden) Date: Jul 24 85 18:06:43 EDT Subject: I thought we were more carefull than this... Message-ID: <[MIT-MC.ARPA].586331.850724.ALAN> MC was unusable for most of today because DEC brought up the machine with the T-300's write-protected. ITS should really say something more explicit when this happens than "T300 ERR ... STATUS = 1 ...". From GSB at MIT-MC.ARPA Wed Jul 24 04:37:28 1985 From: GSB at MIT-MC.ARPA (Glenn S. Burke) Date: Jul 23 85 22:37:28 EDT Subject: RMAIL display problem Message-ID: <[MIT-MC.ARPA].586008.850723.GSB> I cam make it happen with peek on a 60 high 118 wide screen, just like i can with rmail. looks like the cursor positioning goes bonkers as a function of me typing at it. From GSB at MIT-MC.ARPA Tue Jul 23 00:06:15 1985 From: GSB at MIT-MC.ARPA (Glenn S. Burke) Date: Jul 22 85 18:06:15 EDT Subject: RMAIL display problem -- you'll like this Message-ID: <[MIT-MC.ARPA].584681.850722.GSB> right here in the privacy of my own office, i can reproduce this, freeze the screen, and get a hardcopy of the lossage. Isn't VMS wonderful? From KLH at MIT-MC.ARPA Mon Jul 22 19:30:54 1985 From: KLH at MIT-MC.ARPA (Ken Harrenstien) Date: Jul 22 85 13:30:54 EDT Subject: RMAIL display problem Message-ID: <[MIT-MC.ARPA].584298.850722.KLH> Some additional data which supports the theory that a user-program or ITS TTY bug may be involved: Date: Jul 18 85 22:40:40 EDT From: Glenn S. Burke Subject: RMAIL display problem To: KLH at MIT-MC.ARPA Message-ID: <[MIT-MC.ARPA].581085.850718.GSB> All the times i have seen such an error i have been able to find duplicated text on the screen and the supposition was that it was a duplicated tcp packet or something like that. I have seen this both internetting from ru-net to here (from a 20) and i believe just within rutgers (tops-20 -> tops-20 just on ru-net). ---------------------- Date: Jul 19 85 23:45:04 EDT From: Glenn S. Burke Subject: tty lossage To: KLH at MIT-MC.ARPA Message-ID: <[MIT-MC.ARPA].582348.850719.GSB> well maybe i should take back what i said last night. I'm coming from a microvax vaxstation running a vt100 emulator window, running decnet to a 750 (corwin) from whence i'm doing chaosnet supdup to mc. The window size is 94 wide by 55 high [i TOLD it 96 wide at this end, you know how these things are...] anyway, i have a two screen long (at this screen size) message, and if i have it redisplay the first and get a space (in rmail, go to next screen) before it finishes, it invariably fucks up. anyway, there ain't no tcp in THIS network path. From JNC at MIT-XX.ARPA Fri Jul 19 00:00:00 1985 From: JNC at MIT-XX.ARPA (J. Noel Chiappa) Date: Fri 19 Jul 85, 00:00 Subject: memory woes In-Reply-To: Message from "Alan Bawden " of Fri 19 Jul 85 08:34:10-EDT Message-ID: Ty was having parity error in some module of one sector. He replaced that module and got the exact same error in the exact same place (he said). He thought that was suspicious, and decided to swap the parity controllers to see if the problem moved with the controller. It didn't. Ask him for more details. ------- From ALAN at MIT-MC.ARPA Fri Jul 19 14:33:23 1985 From: ALAN at MIT-MC.ARPA (Alan Bawden) Date: Jul 19 85 08:33:23 EDT Subject: No subject In-Reply-To: Msg of Thu 18 Jul 85 19:03:04 EDT from Christopher C. Stacy Message-ID: <[MIT-MC.ARPA].581459.850719.ALAN> Date: Thu, 18 Jul 85 19:03:04 EDT From: Christopher C. Stacy Why does ITS think it has been up all year? I guess somebody told MC it was 1984 when they first brought it up. From ALAN at MIT-MC.ARPA Fri Jul 19 14:31:13 1985 From: ALAN at MIT-MC.ARPA (Alan Bawden) Date: Jul 19 85 08:31:13 EDT Subject: memory woes In-Reply-To: Msg of Fri 19 Jul 85 07:00:43 EDT from Christopher C. Stacy Message-ID: <[MIT-MC.ARPA].581458.850719.ALAN> Date: Fri, 19 Jul 85 07:00:43 EDT From: Christopher C. Stacy unusable, so I deselected sector 1 where they were happenning (bank 3). Then they appeared to move to sector 0. I deselected sector 0 too. I presume you remembered to turn interleaving off when you deselected sector 1. If things seem to work well for a while today, someone might want to turn sector 1 back on and see if the errors move around. We are down alot of memory at this point. I heard something about an Ampex parity detection module being replaced when DEC was frobbing the machine to replace a core stack in one of their memories. This wasn't DEC, it was TY I believe. I think JNC can tell you about it as well. From CSTACY at MIT-MC.ARPA Fri Jul 19 13:00:43 1985 From: CSTACY at MIT-MC.ARPA (Christopher C. Stacy) Date: Jul 19 85 07:00:43 EDT Subject: memory woes Message-ID: <[MIT-MC.ARPA].581432.850719.CSTACY> The Ampex was getting enough parity errors to render the system unusable, so I deselected sector 1 where they were happenning (bank 3). Then they appeared to move to sector 0. I deselected sector 0 too. If things seem to work well for a while today, someone might want to turn sector 1 back on and see if the errors move around. We are down alot of memory at this point. I heard something about an Ampex parity detection module being replaced when DEC was frobbing the machine to replace a core stack in one of their memories. What's the scoop latest on the hardware? From CSTACY at MIT-MC.ARPA Fri Jul 19 01:03:04 1985 From: CSTACY at MIT-MC.ARPA (Christopher C. Stacy) Date: Jul 18 85 19:03:04 EDT Subject: No subject Message-ID: <[MIT-MC.ARPA].580878.850718.CSTACY> Why does ITS think it has been up all year? From KLH at MIT-MC.ARPA Thu Jul 18 11:55:23 1985 From: KLH at MIT-MC.ARPA (Ken Harrenstien) Date: Jul 18 85 05:55:23 EDT Subject: RMAIL display problem Message-ID: <[MIT-MC.ARPA].580138.850718.KLH> I'm not sure where the fault for this one might be, hence the shotgun message. In RMAIL, when using "space" to step through successive screenfuls of a long message, sometimes output fails to stop at the mode line; it continues for several more lines and runs right off the bottom of the screen, causing the terminal to either scroll or wrap up to the top (depending on one's terminal). The screen is then permanently messed up until a complete redisplay is forced with ^L. This happens for me when connected to MC either via SUPDUP (ie as a software TTY) or via TELNET with a :TCTYP DM2500 declaration. At first I thought it might be a CRTSTY/SUPDUP problem, but my TELNET experiments have convinced me that it really is MC's fault. However, I haven't been able to find a foolproof way of reproducing the lossage. All I can say is that in the course of reading through several SF-LOVERS digests on a 24x79 screen, this bug almost always crops up someplace, sometimes twice or thrice in a row. I type: N E ^K ^X r ; to invoke RMAIL d ; for each message This is probably a TECO bug of some variety, but there's an off chance it might be an ITS TTY handling bug. It's even possible that some EMACS code is screwing up the redisplay. This has happened for quite a while (several months). I hope someone else has a notion of what to look for at this point. If necessary, I could try again to save a reproducible instance of this, although it is a rather painful task. From ALAN at MIT-MC.ARPA Fri Jul 12 15:13:59 1985 From: ALAN at MIT-MC.ARPA (Alan Bawden) Date: Jul 12 85 09:13:59 EDT Subject: crtsty lossage. In-Reply-To: Msg of Fri 12 Jul 85 02:18:26 EDT from Christopher C. Stacy Message-ID: <[MIT-MC.ARPA].573277.850712.ALAN> I renamed SYSBIN;CRTSTY OBIN => CRTSTY BIN => CRTSTY XBIN. I presume the behavior we were observing was something that KLH introduced when he assembled CRTSTY yesterday. BTW, I notice there is a link from TS NCRTSTY to SYSBIN;CRTSTY NBIN, which despite its name is a year older than any other version. Is there a reason for this? From CSTACY at MIT-MC.ARPA Fri Jul 12 08:18:26 1985 From: CSTACY at MIT-MC.ARPA (Christopher C. Stacy) Date: Jul 12 85 02:18:26 EDT Subject: crtsty lossage? Message-ID: <[MIT-MC.ARPA].572965.850712.CSTACY> I just dumped a dead CRTSTY into CRASH;CRTSTY VT100. This was EAK CRTSTY, (.VALUE;IOCH7;) 70110>>SKIPGE @413 130624/ 4,,27000 I guess it got a fatal interrupt, but I don't know anything about this program. There were n of these guys lying around, all stopped in the same way. From CENT at MIT-MC.ARPA Fri Jul 12 06:56:15 1985 From: CENT at MIT-MC.ARPA (Pandora B. Berman) Date: Jul 12 85 00:56:15 EDT Subject: crtsty lossage? Message-ID: <[MIT-MC.ARPA].572909.850712.CENT> something's wrong. over the past several hours i have found half a dozen CRTSTYs disowned with .VALUE 200. also several ___###s disowned. i can only tell from what i see in PEEK; someone who knows more should check this. From JNC at MIT-XX.ARPA Wed Jul 10 00:00:00 1985 From: JNC at MIT-XX.ARPA (J. Noel Chiappa) Date: Wed 10 Jul 85, 00:00 Subject: level2 bughalt In-Reply-To: Message from "Daniel Weise " of Sun 7 Jul 85 17:07:05-EDT Message-ID: Well, the TARAKA DMPCPY business is a daemon copying the dump from the swapping area (where DDTDSK put it) into the real file system. The file seems to have gone into the directory '.' rather than 'CRASH'; '.' is the default directory used by DDTDSK. Noel ------- From TAFT at MIT-MC.ARPA Mon Jul 8 20:54:11 1985 From: TAFT at MIT-MC.ARPA (Jonathan D. Taft) Date: Mon, 8 Jul 85 14:54:11 EDT Subject: No subject Message-ID: <[MIT-MC.ARPA].567927.850708.TAFT> DSK:UNIT 1 LOSING.RH CONI BITS= 1,,157236 Dumped to CRASH;UNIT1 LOSING From DANIEL at MIT-MC.ARPA Sun Jul 7 23:05:05 1985 From: DANIEL at MIT-MC.ARPA (Daniel Weise) Date: Sun, 7 Jul 85 17:05:05 EDT Subject: level2 bughalt Message-ID: <[MIT-MC.ARPA].566858.850707.DANIEL> MC wedged itself again this afternoon. I took crash dump to CRASH LEVEL2 and warm booted. But during warm boot it printed something like TARAKA DMPCPY . CRASH LEVEL2 DELRNM (MC's vt52 is missing so I am typing this in from memory). When I looked for crash;crash level2 the file wasn't there. What did I do wrong? Daniel From MOON at MIT-MC.ARPA Fri Jul 5 06:41:26 1985 From: MOON at MIT-MC.ARPA (David A. Moon) Date: Fri, 5 Jul 85 00:41:26 EDT Subject: IMP down detection Message-ID: <[MIT-MC.ARPA].565142.850705.MOON> Date: Mon 24 Jun 85 17:54:25-EDT From: "J. Noel Chiappa" I think that it's a known bug with the IMP code that if the IMP cycles after ITS is up ITS doesn't deal with that correctly. It used to work. Maybe I broke it when I removed support for NCP protocol a few years ago. From MOON at MIT-MC.ARPA Fri Jul 5 06:07:06 1985 From: MOON at MIT-MC.ARPA (David A. Moon) Date: Fri, 5 Jul 85 00:07:06 EDT Subject: oh, yeah Message-ID: <[MIT-MC.ARPA].565121.850705.MOON> Date: Mon, 1 Jul 85 17:20:31 EDT From: Christopher C. Stacy It is probably a bug that ITS refuses all Chaosnet service when it is being debugged. There is a feature in NETWRK for doing this in the Chaosnet specific code. SYSDBG is maybe not quite fine-grain enough? Also, should move TCPUSW and TCPUP to be near NETUSW, etc. Yeah, probably it should assume NETUSW is good enough. From GSB at MIT-MC.ARPA Tue Jul 2 05:53:18 1985 From: GSB at MIT-MC.ARPA (Glenn S. Burke) Date: Mon, 1 Jul 85 23:53:18 EDT Subject: crash Message-ID: <[MIT-MC.ARPA].562107.850701.GSB5> crash;ucprl5 +1 From JNC at MIT-XX.ARPA Mon Jul 1 00:00:00 1985 From: JNC at MIT-XX.ARPA (J. Noel Chiappa) Date: Mon 1 Jul 85, 00:00 Subject: crashdump In-Reply-To: Message from "Alan Bawden " of Sat 29 Jun 85 05:43:22-EDT Message-ID: I'm dubious about this being a side effect of DEC playing around. The exact thing DEC did involved no frobbing with cables at all; all the did was disable some of the ports the processor was using to reference memory. (I was supposed to explain all this but forgot). What they did was notice that we are running the machine in two way interleave. Exactly why we are doing this is too long to explain here, and I think Dave Moon did so once already. This being the case, they decided that we would be less likely to have 'interference' on the memories if we diabled KBUS2 and KBUS3 (which are after all duplicates of KBUS0 and KBUS1 in two way interleave). I'm not sure I believe this, but I do believe that it can't hurt and I didn't feel like arguing with the DEC guy about it, so I let them do it. (Actually, they disabled the memory ports to those busses, not the busses.) However, the DF10 has it's own memory bus and port, and should not have been affected. It's probably a flaky caused by all the power cycling in the last week. Noel ------- From DEVON at MIT-MC.ARPA Tue Jul 2 03:00:22 1985 From: DEVON at MIT-MC.ARPA (Devon S. McCullough) Date: Mon, 1 Jul 85 21:00:22 EDT Subject: No subject Message-ID: <[MIT-MC.ARPA].561917.850701.DEVON> I snarfed an E job when I already had one (to see what I could see) and it renamed it E!!!!! well that's okay, but wouldn't E! or E0 have done? From CSTACY at MIT-MC.ARPA Mon Jul 1 23:20:31 1985 From: CSTACY at MIT-MC.ARPA (Christopher C. Stacy) Date: Mon, 1 Jul 85 17:20:31 EDT Subject: oh, yeah Message-ID: <[MIT-MC.ARPA].561691.850701.CSTACY> It is probably a bug that ITS refuses all Chaosnet service when it is being debugged. There is a feature in NETWRK for doing this in the Chaosnet specific code. SYSDBG is maybe not quite fine-grain enough? Also, should move TCPUSW and TCPUP to be near NETUSW, etc.