38-Bit Products

# POWER SPECTRUM ANALYSIS WITH THE FPS-5000 SERIES



### **INTRODUCTION**

he power spectrum analysis of time series data is a powerful tool for SONAR, RADAR, and speech processing. In many application areas such as noise analysis, design of digital filters, and signal tracking, spectrum analysis is a common and convenient technique [1]. At the heart of most spectrum estimation applications is the Fast Fourier Transform (FFT) [2]. The FPS-5000 Series array processor obtains high performance on FFTs and other signal processing algorithms. This article describes the manner in which the FPS-5410 array processor is used on a simple peak spectral energy tracking problem to achieve almost 3.9 times the performance of an FPS-100.

### **FPS-5000 SERIES ARCHITECTURE**

The FPS-5000 Series family is based on a distributed signal processing system concept. High throughput processing requirements are achieved by distributing various sub-tasks onto individual system elements. The FPS-5000 Series array processor (shown in Figure 1) includes a Control Processor used for host process communication and control, a large system common memory, I/O Coprocessors, and up to three Arithmetic Coprocessors. Key elements of the FPS-5000 Series used in this application include the Arithmetic Coprocessor, the GPIOP I/O Coprocessor, and the System Common Memory, all of which are described in the following sections.



Figure 1. FPS-5410 system diagram.

# ARITHMETIC COPROCESSOR ARCHITECTURE

The Arithmetic Coprocessor is the key architectural element, providing high performance in the FPS-5000 Series. Multiple Arithmetic Coprocessors may be configured in a distributed processing system managed by the Control Processor. Shared access to the System Common Memory provides a common data base and communications path between processors. Shown in Figure 2, the Arithmetic Coprocessor is a specialized array processor with internal architecture optimized toward the FFT and complex vector computations [3]. Operating on a synchronous, 6-MHz clock, the Arithmetic Coprocessor includes one floating-point multiplier and two floating-point adders with 32-bits of precision. The internal memory has sufficient speed to allow two memory operations by the Arithmetic Coprocessor and access by the DMA controller on each cycle, with no impact on processor performance.



Figure 2. Arithmetic Coprocessor architecture.

# GPIOP I/O COPROCESSOR ARCHITECTURE

The GPIOP (General-purpose Programmable I/O Processor) is an interface processor designed to provide a flexible high-speed path into the FPS-5000 Series array processor from external devices (see Figure 3). The GPIOP includes two processing elements: a 20-bit wide bit-slice processor used for address calculations and device protocol, and a format processor for fix/float and pack/unpack operations. The format processor (FPROC) is supplied with over 55 format conversion routines that provide on-the-fly conversion between many popular data formats. The GPIOP is a powerful device that is easily adapted for connection to A/D and D/A converters, tape drives, disks, bulk memory, display systems, and real-time control equipment.

# FINDING THE PEAK POWER FREQUENCY

As an example of power spectrum calculations, consider the task of estimating the frequency with the highest received power within a frame of 1024 real input data points on a continuous basis. Figure 4 shows the dataflow organization, and the processing resources of the FPS-5000 Series used to implement this application.









The signal is sampled by an A/D converter under direct control of the GPIOP. In this example the GPIOP manages double 1K input buffers in System Common Memory. The Arithmetic Coprocessor then transfers data from these buffers to its local memory. The Arithmetic Coprocessor architecture performs a windowing operation (Hamming) and a forward real FFT [4]. The complex spectrum is moved out of the Arithmetic Coprocessor and back into System Common Memory where the Control Processor performs power conversion and peak detection. The resulting frequency pointer is passed to the host for analysis or display, and the entire process is repeated for every 1K block of input samples. Because of the highly parallel architecture of the FPS-5000 Series array processor, most of the data transfers and control of each processing element are overlapped with computation.

## IMPLEMENTATION

The power spectrum and maximum search operations are easily performed using standard FPS-5000 Series math library routines. Analysis shows that by estimating the performance of appropriate routines, timing predictions can be made for this power spectrum analysis application. The entire data collection phase is controlled by the programmable GPIOP, which executes interface protocol with the A/D converter and managed buffers and flags in the System Common Memory.

The Arithmetic Coprocessor is able to fully overlap DMA and channel list interpretation with arithmetic processing as long as calculations require more time than control and DMAs. Similarly the Control Processor performs arithmetic processing as well as overall process control. In this example the Control Processor's arithmetic operations and control overhead are together less the Arithmetic Coprocessor processing time, and are completely overlapped, using double buffers. Therefore the total frame time is limited to the VMUL and RFFT operations in the Arithmetic Coprocessor. In Figure 5, the model FPS-5410 is shown having a configuration that includes 256K of system command memory and one Arithmetic Coprocessor.

An equivalent implementation in which an FPS-100 executes all of the processing is shown below for comparison. For the FPS-100 the frame time is equivalent to the arithmetic process time because all buffers are held in Main Data memory and the GPIOP manages all input buffers.

| PROCESS                                                                                                                   | FPS-5110<br>Arithmetic Control<br>Coprocessor Processor |                | FPS-100                    |
|---------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|----------------|----------------------------|
| <ol> <li>Hamming (VMUL)</li> <li>Frequency (RFFTB)</li> <li>Power<sup>2</sup> (CVMAGS)</li> <li>Maximum (MAXV)</li> </ol> | 276<br>1350                                             | 1026<br>515    | 772<br>4050<br>1026<br>515 |
| Subtotal microseconds                                                                                                     | 1626                                                    | 1541           | 6363                       |
| Total Frame Time (µsecs)<br>Bandwidth at 1K/Frame                                                                         |                                                         | 1626<br>620kHz | 6363                       |

FPS-5410/FPS-100 Time = 3.9 Performance Increase

#### Figure 5. Performance Analysis

The Arithmetic Coprocessor processing element contained in the FPS-5410 is designed to include independent data transfer and arithmetic processing sections, thus allowing concurrent loading/unloading and processing with no time penalty. The timing chart for the maximum power spectrum computation is shown in Figure 6. As shown, all data transfer operations are overlapped with the spectral calculations. Vector multiply is used for the window operator rather than an integrated Hamming routine because it yields better performance. The Arithmetic Coprocessor contains 16K of data memory which is more than enough memory to hold the extra coefficient vector.



Figure 6. System timing chart.

4

# SUMMARY

The power and flexibility of a distributed processing system is clearly illustrated by the peak power frequency calculation. As shown above the FPS-5410 with one Arithmetic Coprocessor achieves almost 3.9 times the performance of currently available array processors. The availability of more specialized signal processing routines and application development tools make the FPS-5000 Series array processor a powerful tool for new system designs.

# REFERENCES

- 1. Blackman, R. B. and Tukey, J. W. "The Measurement of Power Spectra from the Point of View of Comunications Engineering," Dover Publications, 1958.
- Bergland, G. D. "A Guided Tour of the Fast Fourier Transform," I.E.E.E. Spectrum, July 1969.
- Tracy, R. W. "A Distributed System Architecture for High Throughput Array Processors," submitted to the SIAM Conference on Parallel Processing, November 1983.
- 4. Rabiner and Gold "Theory and Application of Digital Signal Processing," Prentice-Hall, 1975.
- "Power Spectrum Calculation with the Analogic AP400," Application Tips, Analogic Corp., Wakefield Mass., 1982.



Ian Curington is a Product Specialist with Floating Point Systems, Inc., where his responsibilities include technical marketing of I/O products, graphics, and image processing applications for 38-bit products. Before assuming his present position, Mr. Curington was a Systems Engineer in the FPS Performance Analysis group. Prior to joining FPS in 1979, he worked as a Systems Programmer on real-time data acquisition and control systems. Mr.

Curington holds a BS degree in mathematics from Lewis and Clark College. His professional interests include computer graphic architectures, digital image synthesis, and motion dynamics. [800] 547-1445, extension 783.

# NEW 38-BIT RELEASES

# AP190L D04-000 RELEASE NOTES FOR IBM CMS (IB03)

# 1 DEVIATIONS FROM STANDARD RELEASE

### 1.1 AP-FORTRAN

APFTN38 is available as optional software.

### 1.2 Chained APEX (CAPEX)

Chained APEX (CAPEX) processing is provided. If reading or writing extended memory (page select) registers in chain mode, the channel program is forced to execute with a call to APEXC. There is a 400 CCW limit in one channel program. If this limit is reached, the channel program is forced to execute with a call to APEXC.

### 1.3 Control-bit 5 Interrupts

Control-bit 5 interrupts are not supported. The related APEX routines call APSTOP.

### 1.4 DMA Overlap

An FPS-defined DMA is equivalent to an IBM channel program. In step mode a DMA operation can be started while the AP is running. Control will return to the user before completion of the DMA. Any subsequent access request will be suspended until the DMA completes. In chain mode a DMA can be performed while the AP is running.

The AP cannot be started running while a DMA is in progress.

#### 1.5 Format Registers

AP and host access to the format (high and low) registers and the IFSTAT register is not supported.

### 1.6 HMA and WC Registers (AP)

Access to the HMA and WC registers from within the AP is not supported.

### 1.7 HMA and WC Registers (Host)

Access to the HMA and WC registers from the host is simulated in software. These values are never actually written to or read from the AP.