From ScottatSRI-KL Fri May 30 00:00:00 1980 From: ScottatSRI-KL (Scott J. Kramer) Date: 30 May 1980, 00:00 Subject: Well... Message-ID: :DDTSYM BYERUN is set non-zero, don't know exactly what. ------- From ScottatSRI-KL Fri May 30 00:00:00 1980 From: ScottatSRI-KL (Scott J. Kramer) Date: 30 May 1980, 00:00 Subject: PWORD weirdness Message-ID: Wonderful bugs: MC ITS.1191. PWORD.1733. TTY 42 8. Lusers, Fair Share = 64% *$$u MC ITS 1191 Console 42 Free. 07:57:57 MC ITS.1191. PWORD.1733. TTY 42 8. Lusers, Fair Share = 74% *sjk$^A (This person's mail is forwarded to CHNL NOT OPEN Bad file = ^Q : ^Q ; ^Q ^Q *$^A (This person's mail is forwarded to Top level interrupt, tree detached MC ITS 1191 Console 42 Free. 07:58:07 MC ITS.1191. PWORD.1733. TTY 42 8. Lusers, Fair Share = 72% *sjk$^A (This person's mail is forwarded to CHNL NOT OPEN Bad file = ^Q : ^Q ; ^Q ^Q *$$u These are the topics for which HELP can give more info. Type: :HELP for more info on a given topic. ACOUNT LOGIN TCTYP LUSER ITS JCL PRINT LOGOUT SEND NAME SSTATU WHOJ BUG WHO DATE TIME TIMES TIMOON OCTPUS WHOIS HELP MAIL QSEND PRMAIL PRSEND LISTF USERS * *:logout See Ya Later ___035... The plural of spouse is spice. ...and it hangs here. And somehow :DDTSYM BYERUN is set to -1. Rather bizarre... -sjk ------- From RWK at MIT-MC Thu May 29 00:00:00 1980 From: RWK at MIT-MC (RWK at MIT-MC) Date: 29 May 1980 00:00 Subject: Scheduler observations Message-ID: Response to --MORE-- in peek can still take 15 seconds or so (although not usually). It's a lot better, but not perfect. Also, I just gunned down a job with "1000000" interrupt according to PEEK. What is that interrupt? It took about 2 minutes for it to finally go away. In the meantime, it could not be :UJOB'd, its FLSINS was zero, its state in PEEK was *MULTICS, its inferior disappeared at about the 1 minute mark, etc. Also, HIC and I saw a hung tree. A WHOLIN job was hung waiting for output room, which isn't suprising but the HACTRN was hung in USRI. It went away OK when gunned (we were going to lunch, not investigating the system). A lot better than before, but still some quirks. From RWK Thu May 29 00:00:00 1980 From: RWK (Robert W. Kerns) Date: 29 MAY 1980, 00:00 Subject: No subject Message-ID: I'm not certain SJK's suggestion should be done. When I use DIR:, I often use it in more than one mode. However, I'm probably just abnormal. Second opinion? From RWK at MIT-MC Thu May 29 00:00:00 1980 From: RWK at MIT-MC (RWK at MIT-MC) Date: 29 May 1980 00:00 Subject: No subject Message-ID: of allocated directories. From GJC at MIT-MC Wed May 28 00:00:00 1980 From: GJC at MIT-MC (GJC at MIT-MC) Date: 28 May 1980 00:00 Subject: No subject Message-ID: T36 seems to be wedged. From RWK at MIT-MC Tue May 27 00:00:00 1980 From: RWK at MIT-MC (RWK at MIT-MC) Date: 27 May 1980 00:00 Subject: Schedualer lossage Message-ID: Fooi, I sent mail before but it log lost in the crash. Anyway, there definitely IS a problem, but it's not just that performance is bad. There is a screwy interaction at work here between interrupts and I/O and the scheduler. When a job has non-defered pending interrupts, SCHACK does not skip-return, leading to doing a full schedual. I think this relates to the problem, in that only jobs with pending interrupts are scrod. Because they only get run when a full schedual is done, they can be heavily discriminated against under some circumstances. I'm not sure just what those circumstances are, but PEEK is screwed in --MORE-- but not elsewhere. This only happens during heavy load, presumably because the full schedual then happens more often. These observations are based on a lot of poking around, with H19WHO keeping me informed of the state of my current job. 1) EMACS does it's input in a single big SIOT, getting MPV's along the way, which it handles and creates the core. Unfortunately, these interrupts may take a very long time to go off, and it won't be schedualed until they do. 2) PEEK .SLEEP's except during the --MORE-- break, where it waits in an input .IOT ... which never returns, since when you type a space, the %PITYI interrupt is asserted in .PIRQC, but NEVER GOES OFF. If you clear this interrupt, disable %PITYI in .MASK, type the space, it reads the space. Then of course you have to restore the mask, since PEEK reads it's commands at interrupt level. 3) When COMSAT loses, getting schedualed but every minute or so, I've checked, found the pages required in core, at an ordinary BLT or TLNN etc., and with a %PIRLT interrupt that WAS NOT GOING OFF. Eventually the load would drop and it would run. 4) DDT sometimes doesn't get its interrupt when the inferior is ^Z'd, while it's sitting in it's .HANG I also saw a .DTTY take 20 seconds, I don't have any other clues on that. Oddly enough, giving DDT a ^G seems to wake it up when it's hung in that manner. Perhaps the TTY driver manages to frob it in just the right way to make it run again, I don't know. Or maybe it just happened to get schedualed when I typed &^G, but that seems unlikely; why didn't it get schedualed for 15 seconds before that? Anyway, I hope this is enough to track down the problem. The threshold of load before it shows up is fairly high, requiring many runnable jobs. Probably enough to fill SCHBTB or something and just sit there keeping it full. From RWK at MIT-MC Tue May 27 00:00:00 1980 From: RWK at MIT-MC (RWK at MIT-MC) Date: 27 May 1980 00:00 Subject: Slowness Message-ID: I'm not so sure that the reported slowness of disk I/O is anything other than very heavy disk loading. Looking at the channels listing in peek, we're often in the 20-25 channel range, and it's not uncommon for there to be 15 competitors for one disk drive. Normal is more like 15-20 range, with maybe 8-10 competitors for a drive. Today seems to be more disk intensive than usual for some reason. TY is doing a DUMP. There are several compilers and TEX jobs plus the dover spooler and PFTHMG and ... I've seen the system grind to a halt when there were 25 channels in use before. And indeed, when it drops back down to normal, it doesn't seem grossly slower than usual, although I can't assert it's normal. Under the 25-channel load, I read SYSEN1;DDT > (98 blocks) into an EMACS in 110 seconds, watching with H19WHO updatng every 5 seconds. Only 4 samples out of that period of 22 were in other than +DSKBI. Under conditions of about 18 channels in use the time was 65 seconds with only one sample not in +DSKBI. 65 seconds is not inconsistant with times I've observed before under "similar" load. It has also been observed by many people that the system is much less responsive WRT disk I/O when a dump is being taken. Disclamer: The above observations do not show that the system is performing as well as before. My only assertion is that the data doesn't show it when balanced with the load observed. and the lack of a control case. "Before" data would be necessary to make any reasonable judgements. Further Disclamer: The above does not mean that I think the schedular is working 100% right. There are other clear problems with it. Have yo utried continuing a --MORE-- break in PEEK lately? BTW, ITS doesn't use an elevator algorithm for its disk I/O does it? 110 seconds for 98 pages against say 12 competitors for one disk drive looks to me about right for randomly contending competitors, no? This works out to about 10 seconds for the single user, with having it seek the block, then wait until the next schedual to transfer the block to the user's address space (perform the SIOT) and queue up the next transfer. I.e. about 1/10 second for each pair of seek/xfer and schedual. Maybe elevator algorithm might be worthwhile during the heavier load situations? Also, read-ahead when you encounter an extent of more than one block might help. Maybe we should run the old system sometime in the daytime and take some performance measurements to compare with the current situation? From EAKatMIT-MC Tue May 27 00:00:00 1980 From: EAKatMIT-MC (Earl A. Killian) Date: 27 May 1980, 00:00 Subject: ITS 1188 is slower. Message-ID: I agree; reading my large Babyl file into TECO took much much longer just now than it ever has under similar load. From CBF at MIT-MC Tue May 27 00:00:00 1980 From: CBF at MIT-MC (CBF at MIT-MC) Date: 27 May 1980 00:00 Subject: ITS 1188 is slower. Message-ID: ITS 1186 is slower It seems to take longer to do some things, ie. get into Rmail or Info. Quit starting seems to help. (ie. ^Z $P). This claim is based on response time vs. my past experience at various loads. I havn't tried to do any quantitive measurement. From GLS at MIT-MC Mon May 26 00:00:00 1980 From: GLS at MIT-MC (GLS at MIT-MC) Date: 26 May 1980 00:00 Subject: No subject Message-ID: See MC:CRASH;HALTPC 5550 for a crash at PC 5550 (see log). From KLH Mon May 12 00:00:00 1980 From: KLH (Ken Harrenstien) Date: 12 MAY 1980, 00:00 Subject: Tabs Message-ID: ED at MIT-MC 05/12/80 21:38:18 Re: Tabs Subject: Tabs This line begins with a tab. However, it echoes and redisplays consistantly with only one blank space at the beginning. This line begins with two tabs. It has 8 blank spaces. It appears to only happen at the beginning of the line. This is on a C100, using :tctyp c100. This appears to be an ITS problem; I cannot make it happen via software-TTY, so it must be C100 specific. From ___131 at MIT-MC Sat May 10 00:00:00 1980 From: ___131 at MIT-MC (___131 at MIT-MC) Date: 10 May 1980 00:00 Subject: No subject Message-ID: The system console stopped typing out at 20:02 tonight. The console 11's lights are locked on ... it looks kinda out of it ... I asked RG who advised not doing anything and just waiting until the system goes down voluntarily, so I'm leaving it be. -kmp From CBF at MIT-MC Sat May 10 00:00:00 1980 From: CBF at MIT-MC (CBF at MIT-MC) Date: 10 May 1980 00:00 Subject: strange occurances on T13 (the first Vadic 3467 line) Message-ID: It is more than obvious that the T13 is not detecting hangup. (Ie. either the modem card, the wires from it to the DH-11, the DH-11, the tables in the I/O-11, the I/O-11, the tables in ITS, ITS, the KL-10 or something in the path is broken). Someone might look into fixing it before half the disk is taken up by bug reports. From ASB at MIT-MC Sat May 10 00:00:00 1980 From: ASB at MIT-MC (ASB at MIT-MC) Date: 10 May 1980 00:00 Subject: ADDENDUM TO PREVIOUS NOTE Message-ID: I now find that although wake-up still fails, CNTL-Z works fine. The TCTYP was apparently irrelevant. From ASB at MIT-MC Sat May 10 00:00:00 1980 From: ASB at MIT-MC (ASB at MIT-MC) Date: 10 May 1980 00:00 Subject: No subject Message-ID: I have repeatable problems characterized by the following: [1] My equipment: ADM-3A thru VADIC 3434 over FTS 835-6985 [2] Procedure: Dial the phone no. get CXR light on VADIC indicating carrier locks. Type a , observe TXD light on vadic flash, indicating that was successfully transmitted. No answering flash of RXD light on VADIC, indicating that the return copy of is not received. Repeat the transmission and return-receive failure as many times as desired. I have done it 10 times with identical behavior. Type :TCTYP FULL into MC. After each character I notice the TXD flash, but no RXD flash. No response yet. Type CNTL-Z . Now the machine wakes up and transmits the banner: MC ITS.1168.PWORD bla bla TTY whatever ## LUSERS etc bla bla Now I find that I can log in. After doing so, I do :P O and the line corresponding to me reads 13 137 ASB P T1061 24 80 X <- which I interpret to mean that the systems thinks my terminal is T1061. Perhaps I somehow told it this, though I am unaware of having done so. I have repeated this process successfully 3 times in the last 1/2 hour. From ASB0 at MIT-MC Fri May 9 00:00:00 1980 From: ASB0 at MIT-MC (ASB0 at MIT-MC) Date: 09 May 1980 00:00 Subject: No subject Message-ID: Logged in as ASB, I tried to log in again thru a printing terminal to get a short listing. I found that I was attached to a tree belonging to TYANG, and could not log in on my own. So I detached his tree, logged in and all was well. From RICH Fri May 9 00:00:00 1980 From: RICH (Charles Rich) Date: 9 MAY 1980, 00:00 Subject: No subject Message-ID: It is very irritating that the DVR^F command leaves the default device set to DVR: so that, for example, subsequence :PRINT commands which do not explicitly specify AI: or DSK: in the filename fail. The XGP^F command, on which DVR^F is assumedly modelled, does not have this problem. From KMPatMIT-MC Fri May 9 00:00:00 1980 From: KMPatMIT-MC (Kent M. Pitman) Date: 9 May 1980, 00:00 Subject: Here's another one... Message-ID: Date: 05/09/80 05:08:05 From: DLW at MIT-AI To: KMP Re: MC dialup failure on 5/8/80 Just for the record, I also have been unable to make 253-6985 respond to CR. From RWK at MIT-MC Fri May 9 00:00:00 1980 From: RWK at MIT-MC (RWK at MIT-MC) Date: 09 May 1980 00:00 Subject: No subject Message-ID: If someone would be so kind as to tell me which TTY line does not get PWORD, I would be fix it. Noone has cared to tell me yet. It is hardly necessary to send to note to all of BUG-ITS to get this fixed. A single responsible bug report would do it. I am sure you are aware I do not have a modem in my office, let alone a vadic triple-speed. From KMP at MIT-ML Fri May 9 00:00:00 1980 From: KMP at MIT-ML (KMP at MIT-ML) Date: 09 May 1980 00:00 Subject: No subject Message-ID: People have complained to me about dialing MC (GZ) and ML (DANIEL) and getting DDT instead of PWORD on occasion in the last couple of days. Something might be mixed up. From KMP at MIT-MC Fri May 9 00:00:00 1980 From: KMP at MIT-MC (KMP at MIT-MC) Date: 09 May 1980 00:00 Subject: No subject Message-ID: I got multiple reports this evening of dialing 7985 and not waking up MC ... Something may be messed up. ASB's line was also 1/2-dpx for unknown reasons (said he did not :Tctyp Half) on it and doesn't know how it got into that mode but he wasn't seeing anything he was typing for a while and :Tctyp Full corrected the problem. -kmp From RWK at MIT-MC Fri May 9 00:00:00 1980 From: RWK at MIT-MC (RWK at MIT-MC) Date: 09 May 1980 00:00 Subject: No subject Message-ID: It seems that EMACS can't be ^P'd while running a keyboard macro, because it keeps doing .CALL of TTYGET. Now while I doubt that TECO needs to be doing .CALL of TTYGET while doing a keyboard macro, on the other hand it should not require the TTY to do such a call, but should get the information out of .TTST1 and .TTST2 and TTSTS per-job variables. From KMP at MIT-MC Sat May 3 00:00:00 1980 From: KMP at MIT-MC (KMP at MIT-MC) Date: 03 May 1980 00:00 Subject: No subject Message-ID: There exists an AI:SYSENG;OS 93,95,and 96. 95 and 96 seemed to have been victims of hacks, so I deleted them and re-installed 93 which seems to work fine. -kmp From ___106 at MIT-MC Sat May 3 00:00:00 1980 From: ___106 at MIT-MC (___106 at MIT-MC) Date: 03 May 1980 00:00 Subject: OS Message-ID: KMP fixed OS; it had been vandalized. From RICH Fri May 2 00:00:00 1980 From: RICH (Charles Rich) Date: 2 MAY 1980, 00:00 Subject: No subject Message-ID: The :OS program seems to be broken. Regardless of whether it is given a logged in user name or not, it just does a and kills itself. From ED Thu May 1 00:00:00 1980 From: ED (Ed Schwalenberg) Date: 1 MAY 1980, 00:00 Subject: infinite translation Message-ID: GLS at MIT-AI 05/01/80 17:11:40 Re: infinite translation Subject: infinite translation Maybe making an infinite translation may be permitted, but ITS should certainly put an upper limit on the number of translation iterations (it does the same for links). An obvious such limit is the size of its internal translation table (which is fixed)! ITS already does this. EMACS attempts to open the ERR: device, and if that open fails, for ANY REASON including Too Many Translations, it simply tries again. The historical reason for this may lie in the fact that many programs (including TECO itself) were exhibiting the malfeasance of failing to CLOSE the ERR device when done, and the system would occasionally get wedged due to this. A suitable PEEK mode was invented, and TECO at least was fixed to close ERR when done, but this business of reexecuting a failed OPEN was not. From GLS at MIT-AI Thu May 1 00:00:00 1980 From: GLS at MIT-AI (GLS at MIT-AI) Date: 01 May 1980 00:00 Subject: infinite translation Message-ID: Maybe making an infinite translation may be permitted, but ITS should certainly put an upper limit on the number of translation iterations (it does the same for links). An obvious such limit is the size of its internal translation table (which is fixed)! From ED Thu May 1 00:00:00 1980 From: ED (Ed Schwalenberg) Date: 1 MAY 1980, 00:00 Subject: infinite translation Message-ID: The real bug here is in Emacs, which if the OPEN of the ERR: device fails simply retries. I complained about this some time ago. The fact that DDT will create an infinite translation entry of this sort by way of the user typing , is perhaps at fault. Maybe there should be YET ANOTHER DDT flag, or an improvement to one of the existing ones, which would turn off nearly all short commands which naive users would never want (, ,  are the "destructive" ones that come immediately to mind). From SJK at MIT-ML Thu May 1 00:00:00 1980 From: SJK at MIT-ML (SJK at MIT-ML) Date: 01 May 1980 00:00 Subject:  DIR: ... Message-ID: This isn't really a bug, just a suggestion. When one is using the DIR: "device" and does, for example,  DIR:LISP;CDATE DOWN and then does a  .INFO.; (ie no change to FN1 or FN2) the DIR: has been changed back to DSK: and the command barfs. But if one changes only FN1 and/or FN2 then a MODE NOT AVAILABLE error occurs. This really seems to be the opposite of what should happen, DIR: should be "sticky" if only the directory names is changed, otherwise it can revert back to DSK: if filenames are changed. I'm open to arguments for its present method of operation but vote that this be changed in the future if possible. Thanks. -sjk From ___010 at MIT-MC Thu May 1 00:00:00 1980 From: ___010 at MIT-MC (___010 at MIT-MC) Date: 01 May 1980 00:00 Subject: infinite translation Message-ID: DUFTY had a translation of IO *:*;* * => *:*;* * in his EMACS. When it runs, it runs infinitely in OPEN. ITS really should forbid such a translation, since it causes the system to lose totally, with the time hidden from view (it does not show up in PEEK). Fair share dropped to 2%, with about 10% going to users, and the rest just disappearing. Anyway, probably the "right" thing to do is for ITS to detect attempts to define infinite translations, and return an error. Of course, DDT could detect this obvious case.