RT-11/TSX-Plus System Timing

			      Jim Crapuchettes
			      Omnex Corporation
			  2483 Old Middlefield Way
			  Mountain View, CA. 94043
			       (415) 966-8400

			 Spring 1987 DECUS Symposium
			       Nashville, TN


		   (c) Copyright 1987, Omnex Corporation


I.	A little history--

In late 1985 we began work on our Q-bus Ethernet handler for the DEQNA, which
later became NQ-Plus, a superset of the DEC NQ handler.  A significant re-
quirement in our design was support for what we call "split headers"; in add-
ition to the data being sent, Ethernet frames contain at least a 48 bit source
address, a 48 bit destination address and a 16 bit protocol type.  Often, add-
itional information is required.  In the RT-11 NQ, this network-related infor-
mation must be stored in memory immediately preceding the data itself; this
often means that the data must be copied in memory (sometimes more than once)
to assemble a complete Ethernet frame before it can be sent.  So we wanted to
be able to have a user separately specify the memory area where this informa-
tion about the data is located.

Since a major goal of this design was to maximize DEQNA performance, we wanted
to be sure that the method we selected for specifying the header area did not,
in fact, degrade performance.  We had several possible ideas for specifying
the area, but wanted to know which was best.

I am a believer in measuring program timing when it can be done.  We have a
KWV11-C equivalent programmable real-time clock, so I wrote a program to use
the clock to time a system call.  Since we wanted to know how long it would
take to call a handler with an I/O request, the program timed a very simple
I/O request, a wait-mode read of 2 words from block 0 of the NL: handler.
The clock was set up to run with a 1MHz timebase (1 microsecond per pulse) and
the overflow interrupt was enabled to increment a high-order word on overflow.
When the operation was completed the clock was stopped and the count of pulses
from the 1 microsecond timebase was read.  After each of 5 measurements the
two words of time were converted to octal and decimal and output it to a file.

As I began using this program, I noticed that the times varied more than I ex-
pected.  So I began increasing the number of trials (time measurements); from
5 to 10, then from 10 to 50, then from 50 to 100, then from 100 to 1,000, and
finally from 1,000 to 10,000.  While doing this, I also stopped worrying about
32-bit time values (16 bits can time to 65 milliseconds) and just stored the
time in memory; when all timing trials were done, the times were converted to
decimal and output to a file.  Shortly after I began this experimentation I
realized that the Line-Time Clock was the major source of some of the varia-
tion I was seeing, but I left it enabled for all of the tests so that the time
taken by the clock interrupt service routines could be identified in the data.

RT-11/TSX-Plus System Timing, Spring 87 DECUS				Page 2


As the number of measurements increased, we had to have a way to analyze the
results.  So I also wrote a program to read in the times, perform a frequency
analysis on the values and then generate a histogram from the results.  Since
I was field testing the new RT-11 BASIC-Plus at the time, I wrote the analysis
program using that language.

As we began to look at some of the histograms and talk about what they meant,
we came to the conclusion that we absolutely did not want to do a separate
system call to specify the split header for each received or transmitted
frame.  So we chose to use a Special Function call to specify two header areas
for transmit and two for receive on each of the eight NQ units.  The headers
are described by their location and size.  After this semi-static definition,
the "block" argument for the transmit and receive Special Functions are used
to select one of the two split headers or no split header.

Having made this decision, we had no real reason to continue further study of
the timing data, but I was interested in some of the features that appeared in
the histograms.  So I did what turned out to be quite a bit of additional work
to try to better understand what we had seen.


II.	What we did--

The time measurements were made on the 2 Q-bus processors that we use regular-
ly, an 11/23 and an 11/73.  Each test was done under RT-11/SJ, RT-11/FB, RT-
11/XM and TSX-Plus.  Under RT-11/FB, RT-11/XM and TSX-Plus the tests were also
run with more than one program in memory.  We ended up with 10 test configura-
tions that were run on each of the two processors. The configurations were:

	code	configuration
	----	-------------

   1.	sj	RT-11/SJ
   2.	fb	RT-11/FB
   3.	f1	RT-11/FB with VDT in the foreground
   4.	f2	RT-11/FB with VTCOM in the foreground
   5.	xm	RT-11/XM
   6.	x1	RT-11/XM with VDT in the foreground
   7.	x2	RT-11/XM with VTCOM in the foreground
   8.	t1	TSX-Plus V6.2 with one job
   9.	t4	TSX-Plus V6.2 with WINPRT detached and 2 jobs in KMON
  10.	t5	TSX-Plus V6.2 with WINPRT detached and 3 jobs in KMON

Only part of the "f2" data was acquired.  With the "f2" configuration, there
was not enough memory for the program, VTCOM and the USR, so on one of the
tests, the USR began swapping in and out for every call; these two tests were
terminated and the data files were discarded.

I reviewed all of the RT-11 system calls and selected a small set to time.  I
specifically avoided timing I/O on any mass-storage device (since I was trying
to time the system, not the disk), but did expand the read timing tests on NL:
to include VM: (RAM disk in main memory) and QM: (RAM disk on a separate board
that we have been developing).

RT-11/TSX-Plus System Timing, Spring 87 DECUS				Page 3


The system calls selected were:

		Request		Switch	Comments
		-------		------	--------

		.DATE		/D
		.GVAL		/G
		.RCTRLO		/O
		.TRPSET		/T	#0 (reset the trap handler)
		.CSTAT		/C	Do with open channel
		.READW		/R	Only to NL:, VM: or QM:
		.WAIT		/W	Do with open (idle) channel
		.LOOKUP		/L	Do with a single file, .PURGE after
		.PURGE		/P	Done with channel purged already


In addition, I timed a "nop" test which consisted of a "mov #arg,r0" and a
"nop" to serve as a baseline for the timing values.  (This test was selected
with the /N switch).  And, as I mentioned above, the READW was done using NL:,
VM: and QM:.  However, when starting the tests with QM: on the 11/73, after
the tests with QM: on the 11/23 were finished, the results showed that the
times for VM: and QM: were so similar that the QM: tests were not finished.
With the two tests that were terminated due to USR swapping, the total number
of tests ranged between 10 and 12 for each of the tested configurations and
processors.

For reference, two examples of the code sequences used are:

	.sbttl	$nop - Point to read argument area and do a nop

	$nop::
	10$:	mov	#KW1M+KWext+KWGO,@#KWCSR	;Start the clock

		mov	#rargs,r0		;Get address of argument area
		.list	meb			;[Expand so nothing hidden]
		nop				; and do a NOP
		.nlist	meb			;[Return to normal listing]

		bis	#KWmSt2,@#KWCSR		;Trigger ST2 to get count
		bic	#KWGO,@#KWCSR		; and Stop the clock
		mov	@#KWBPR,ClkCl		;Get low part of clock time

		mov	ClkCl,(r5)+		;Save this time interval
		dec	tstcnt			;Done with all tests yet?
		bne	10$			; No, go back for another
		return				;Yes, return to caller

RT-11/TSX-Plus System Timing, Spring 87 DECUS				Page 4


	.sbttl	$date - Get the system date word

	$date::
	10$:	mov	#KW1M+KWext+KWGO,@#KWCSR	;Start the clock

		.list	meb			;[Expand so nothing hidden]
		.date				;Get the system date word
		.nlist	meb			;[Return to normal listing]

		bis	#KWmSt2,@#KWCSR		;Trigger ST2 to get count
		bic	#KWGO,@#KWCSR		; and Stop the clock
		mov	@#KWBPR,ClkCl		;Get low part of clock time

		mov	ClkCl,(r5)+		;Save this time interval
		dec	tstcnt			;Done with all tests yet?
		bne	10$			; No, go back for another
		return				;Yes, return to caller



III.	The data files--

Each data file is 41 (decimal) blocks long and consists of a one-block header
and 40 blocks of 10,000 16-bit binary time values in microseconds.  The header
block has three variable-length lines of ASCII text, terminated by Carriage-
Return/Line-Feed pairs, and at least 3 extra Carriage-Return/Line-Feed pairs;
the remainder of the block is null-filled.  The first line contains the pro-
gram name, program version, date, time and operating system name.  The second
line is a copy of the CSI command line given to the timing program; it shows
the output file name, input device (if specified) and switch specifying the
operation to be timed.  The third line is a comment line.

The data files are named the following way: "pp" is the two digit processor
type (23 or 73), "t" is the one character test code (switch code), "dv" is
either null or the two-character name of the input device for the .READW tests
and "cc" is the configuration code from above.  These pieces are combined to
form the data file or histogram file name --

	pptdv.Dcc (Data file) or pptdv.Hcc (Histogram file).

Note that none of the histogram files have been included; they can be regen-
erated as needed.

The timing data from the 10 to 12 tests, 10 configurations and 2 processors
resulted in 231 data files.  The exact complement of the data files is given
in the two tables below.  Each "X" represents a data file that is present in
the data sets on this SIG tape.  With 10,000 time values in each data file,
these files contain over 2.3 million time samples.

RT-11/TSX-Plus System Timing, Spring 87 DECUS				Page 5


		    ** Summary of data files provided **

	       SJ   FB   F1   F2   XM   X1   X2   T1   T4   T5

	23N	X    X    X    X    X    X    X    X    X    X
	23D	X    X    X    X    X    X    X    X    X    X
	23G	X    X    X    X    X    X    X    X    X    X
	23O	X    X    X    X    X    X    X    X    X    X
	23T	X    X    X    X    X    X    X    X    X    X
	23C	X    X    X    X    X    X    X    X    X    X
	23RNL	X    X    X    X    X    X    X    X    X    X
	23RVM	X    X    X    X    X    X    X    X    X    X
	23RQM	X    X    X    X    X    X    X    X    X    X
	23W	X    X    X    X    X    X    X    X    X    X
	23L	X    X    X         X    X    X    X    X    X
	23P	X    X    X    X    X    X    X    X    X    X

	       SJ   FB   F1   F2   XM   X1   X2   T1   T4   T5

	73N	X    X    X    X    X    X    X    X    X    X
	73D	X    X    X    X    X    X    X    X    X    X
	73G	X    X    X    X    X    X    X    X    X    X
	73O	X    X    X    X    X    X    X    X    X    X
	73T	X    X    X    X    X    X    X    X    X    X
	73C	X    X    X    X    X    X    X    X    X    X
	73RNL	X    X    X    X    X    X    X    X    X    X
	73RVM	X    X    X    X    X    X    X    X    X    X
	73RQM	                                   X    X    X
	73W	X    X    X    X    X    X    X    X    X    X
	73L	X    X    X         X    X    X    X    X    X
	73P	X    X    X    X    X    X    X    X    X    X


IV.	Cautions about the data--

1.	I am not a statistician; I am an engineer who has been programming
	computers in control and lab and data acquisition situations for
	about 20 years.  The result is that I don't know how to be sure that
	10,000 samples is enough to really characterize any of these tests.

2.	The KWV11-C equivalent programmable real-time clock was running with
	a 1 MHz time base.  This means that the clock resolution is 1 micro-
	second and that the maximum synchronization error is also about 1
	microsecond for starting and for reading the clock.

3.	The memory boards on each of the systems are not DEC memory and I
	did not have a way to try different memory boards to see how that
	might affect the timings.  The 11/73 has a 1Mb National Semiconductor
	memory and the 11/23 has a 256Kb National Semiconductor memory.

4.	The versions of RT-11 run on the two systems were not the same.  The
	11/73 runs TSX-Plus all of the time and has not been upgraded from
	RT-11 V5.1C yet.  The 11/23 was running RT-11 V5.2.
�����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������