Motion Detection Technologies
Presented at the 39th Annual Meeting of the Institute of Nuclear Materials Management, July 1998
CANBERRA
8401 Washington Place, N.E.
Albuquerque, NM 87113
Tel: (505) 828-9100
Fax: (505) 828-9115
email: vthompson@canberra-abq.com
W. Horak, A. Reisman
Brookhaven National Laboratory
P.O. Box 5000
Upton, NY 11973
Tel: (516) 344-2627
Fax: (516) 344-5344
A. Nadezhdinskii, S. Rudov, L. Medvedev, M. Spiridonov
General Physics Institute
Vavilova 38
Moscow, Russia 117942
Abstract
Ongoing collaboration in the area of Safeguards surveillance between CANBERRA, the General Physics Institute (GPI), and the International Safeguards Project Office at Brookhaven National Laboratory (BNL) have achieved significant advances in front-end motion detection capabilities; also known as "Smart Video" for digital Safeguards systems. While research and refinement are expected to continue until the year 2000, this project has seen major progress throughout the past year. The algorithms and software for benchmarking single and multi-camera systems have been developed, and is in the process of refinement. In fact, it has been determined that a single computer is not sufficient for the multi-tasking processes of grabbing images, digitizing the camera signal, calculating and displaying results, etc. Hence, a Master-slave network has been developed which speeds up the process. This paper will demonstrate the ability of Smart Video to automatically identify and recognize the movement of sensitive items within the camera's field of view.
Introduction
The concept of "Smart Video" originated with the Arms Control Treaty Verification program. However, experiences with remote monitoring indicate that this technology offers significant benefits to Safeguards. Consequently, a small international research and development activity was initiated with the goal of refining the concepts and establishing the feasibility of using Smart Video in the Safeguards regime. The funding and program management is provided by the International Safeguards Project Office at Brookhaven National Laboratory; technical direction and application analysis is provided by CANBERRA; and the actual development of the smart video prototype is being performed by researchers at the General Physics Institute in Moscow.
For many years there have been efforts to develop technologies that can reduce the volume of data collected in Safeguards surveillance scenarios. Front-end motion detection (FEMD) has been under development for Safeguards use for quite some time. Within the past few years FEMD technology has matured to the point that it has recently been deployed for large-scale remote monitoring field trials. Smart Video has its roots in basic scene-change detection that is the basis of FEMD.
The intent of Smart Video is that without human direction or intervention the camera system will detect movement of objects, identify any "objects of interest", and record their trajectory. Smart Video can dramatically reduce the volume of data collected in an unattended or remote monitoring scenario by filtering out motion alarms that do not involve specific objects of interest. We anticipate that the ability to recognize specific objects and record their movements will have immediately beneficial application to Safeguards of activities such as fuel transfers, materials movements, refueling, and other similar activities that are so complex that they currently either require an inspector to be physically present, or the monitoring systems generate huge amounts of data, much of it redundant, that must be reviewed by the inspector.
Background
Today's motion detecting cameras all use variations of the same method for detecting scene changes. Conceptually, each successive picture is compared to a reference image pixel by pixel. If the number of pixels that are different exceeds some threshold, the picture is declared to be different from the reference image. In practice it is necessary to considerably preprocess the pictures to suppress random noise from CCD dark currents and other sources. A simplified conceptual representation of the FEMD process is:
An alarm is declared when
where d is the alarm threshold and where

is the pixel by pixel difference between the current picture and the reference image, the factor r j is a noise suppression term, and e is the difference sensitivity threshold.
There are many different variations of this process that are actually implemented. For example, the discrete cosine compression coefficients can be compared on a block by block basis. In any case, it is assumed that the change in the pixel values is due to motion in the camera's field of view. However, motion is not the only thing that can cause sufficient differences to generate an alarm. Fluctuations in lighting, camera flicker, glint and other phenomena contribute to a large number of false alarms. Furthermore, the camera detects and records all scene changes; most of which are incidental to the actual surveillance and have no Safeguards significance. Despite these limitations, current FEMD technology has dramatically reduced the volume of data that must be collected and reviewed compared to traditional time-lapse surveillance methods. Smart Video seeks to reduce both false alarms and incidental alarms by extending the FEMD process to consider changes to the geometry of the field of view; and then use the geometry information to classify and identify objects in motion.
Most of the historical work to detect and categorize objects was primarily directed towards satellite imagery. For example, detecting, identifying, and counting the number of airplanes on a military base. The techniques developed for this application are dependent on certain boundary conditions that are obtained in this scenario; such as fixed vertical aspect, minimal motion, etc. There was no attempt to actually detect and track motion in the Safeguards sense, and the orientation of the "objects of interest" was relatively fixed and constant. Much of the more recent work has been based on texture-mapping in color-space; which is unavailable in Safeguards imagery. The vastly different boundary conditions that are obtained in the Safeguards regime required that new methods for detecting and characterizing objects be developed.
Technical Approach
There is no guarantee that any object of interest will remain oriented in any particular way with respect to a camera's field of view; that is, the shape of an object as seen by the camera cannot be known a priori. For example, a cylindrical materials container appears to be a rectangle when viewed from the front and as a circle when viewed from the top. In order to identify the object in an arbitrary orientation, it is necessary to characterize the field of view and the object in three dimensions (3-D). Therefore it is known a priori that the volume of interest must be monitored by at least three cameras that are not coplanar with the objects under surveillance.

Figure 1
To simplify the interactions of multiple cameras, the approach was to segment the workload so that each of the cameras could preprocess its own field of view and the results from all cameras combined in a central server to create the 3-D image models (Figure 1). This segmentation approach allowed early efforts to focus on developing the pre-processing algorithms that must be performed in each camera's local 2-D coordinate system. Results from all cameras are combined at a server which translates the results into a common 3-D reference frame. It is only in the 3-D coordinate system of the server that it is possible to estimate the size, shape, moments, and motion of the object in order to determine whether it represents one of the cataloged "objects of interest."
Work to date has concentrated on ensuring that each camera is able to isolate any moving object within its field of view while rejecting scene changes due to fluctuations in lighting. This is done by tracing movement of the centroid (in the camera frame of reference) of the pixels that represent the changes from scene to scene in a sequence of pictures (Figure 2).

Figure 2
Designating the picture taken at time n as Bn, then the pixels in that picture are designated as Bn(xi,yj). As with the basic FEMD processes, the thresholded difference image, Dn, at time n is then constructed by subtracting successive pictures pixel by pixel as follows:

At this point the FEMD process is extended to estimate the geometry of the "blob" of remaining pixels in the difference image. The projections of the difference image along the X and Y axes are computed in the normal way:
![]()
The average value of the axis projections is used to locate the centroid of the pixel cluster:

The centroid provides a rough estimate of the center of the moving object. Finally, the standard deviations in both axes are computed:

The standard deviations provide an estimate of the size of the moving object around the centroid.
This process, using only a single camera provides an estimate of position and size of an object, and with a sequence of pictures provides an estimate of the speed and direction of motion. All of this information by itself is very useful and difficult to obtain in many Safeguards scenarios. Despite the utility of the single camera process, there are still several shortcomings with respect to the objective of recognizing objects in motion. These are the subject of the work that has been accomplished in the past few months.
Recent Progress
Previous work demonstrated the feasibility of the approach, but was restricted to a single camera and was naturally somewhat limited in scope. Recent work has expanded the analysis to include 3-D reconstructions of the objects in the field of view and address several other restrictions. In particular:
- Implementation was restricted to the horizontal plane; thus unable to trace vertical movements.
- The previous methods were able to track only a single object at a time.
- Estimate of the object's size was relative to the camera reference frame rather than along the object's principal axes of inertia.
Because the identification of the moving objects is strongly connected to its generalized geometric characteristics, it follows that the estimate of size along the principal axes of inertia is the most critical enabling element of the technology. Therefore we use this particular issue as an example of the progress made to date.
The approach is to have each camera process its frames separately to determine the principal central axis and object sizes along this axis. This is accomplished in the camera by determining the two eigenvectors of the symmetrical 2x2 submatrix of the moment of inertia tensor. These values, representing the position vectors of the ends of two segments lying in a horizontal plane located at the centroid of the object, are sent to the master server where they are transformed, using the direction cosines of the camera placement geometry, into a common 3-D coordinate system. The problem is that the three pairs of such straight lines do not intersect in one point, but have intersections near the spatial center of the object. The server selects the one segment from each pair of segments which most probably corresponds to a true horizontal segment in the object. Using the fact that the longest segment should be the actual length, width, diagonal, or height of the object, the server estimates the size of the object based on maximum-likelihood collections of three selected segments. The process is represented as follows:
Assume a common coordinate system - XYZ, and three local camera coordinate systems - x1y1z1, ..., x3y3z3. Then the geometric center of an object defined on each coordinate frame can also be defined on the common frame using standard transforms. The projections in the camera frames can then be used to compute the components of the tensor of inertia, J, relative to the geometric center:

Where Xi = (xi-x0) and Zj = (zj-z0) are the distances from the given point to the centroid along the appropriate axes of the matrix. The principal axes of the object are found from the eigenvectors of the matrix. Denoting the elements in the standard way, i.e. Jxx, -Jxz = -Jzx, and Jzz, we can express the characteristic polynomial as:

and because the submatrix is symmetric with real components, its characteristic polynomial has two real roots, l x =l h . The unit eigenvectors x ( x x, x z) and h ( h x, h z) are found from the system:

The eigenvectors x and h set the directions of the principal central axes while the eigenvalues are the moments of inertia about the principal central axes. That is, the eigenvalues represent the average deviations of the projection of intensity on these axes, which is what allows them to be used to estimate the size of the object. Given the principal axes and the moments about the principal axes, the entire set of systems can be translated to the common coordinate system of the server using the direction cosines to the camera coordinate systems. Within the common reference frame the object can be reconstructed in 3-space and the length, width, and height can be estimated in the natural coordinates of the principal axes. These values can then index into a table of characteristics of objects of interest to assign an identity to the detected object.
Results
The methods developed to date are restricted to motion in a plane, but, quite significantly, have demonstrated their ability to identify and discriminate between moving cars and bicycles; although not at the same time. It has also been found that the frame processing rate must be sufficiently fast that linearizations from variational methods apply with sufficient accuracy to reduce errors in estimated trajectories. On the other hand, if the frame rate is too high the changes from scene to scene are so small that they cannot be reliably extracted from the normal background of pixel noise. Experimental evidence indicates that a rate of 5 to 8 frames per second provides the best overall performance. It is noted that this rate is an order of magnitude faster than current Safeguards cameras are capable of. Work still continues to permit tracking and identification of multiple objects simultaneously, to extend the method to track objects in vertical motion, and to refine the criteria for distinguishing and identifying similar objects.
CONCLUSIONS
Significant progress has been made toward the goal of automatically detecting, identifying, and tracking objects of interest in a surveillance scenario. A significant amount of work remains to be completed to achieve this goal, and Smart Video is not expected to emerge as a mature technology until the turn of the century. Nonetheless, the results to date already offer potential benefits in today's remote monitoring Safeguards environment. The demonstrated ability to determine that a scene change was not due to lighting fluctuations, but was indeed due to the movement of a discrete object in the field of view has the potential to dramatically reduce the false alarm rate. In specific complex but well defined Safeguards scenarios it may be possible to reliably discriminate between a small, fast, vertical object (human walking) and a large, slow, horizontal object (fuel assembly). This will enable the system to filter out incidental motion alarms that contain no Safeguards significant information. Both of these capabilities can reduce the volume of data to be collected and reviewed; and may offer a means of reducing the amount of time an inspector must be on site during activities such as refueling. Clearly there are significant benefits that can be derived immediately from the progress to date in the development of Smart Video. The limiting factor in deployment of these techniques is the frame-rate of currently available authenticating cameras. It is expected, however, that this limitation will be somewhat easier and faster to remove than was the development of the processing technology.
1 Wilson, Grahame, IAEA Optical Surveillance Discussion Paper, May 1993, Rev 2.0 93/DID-05/GW.
2 Hall, Ernest L., Computer Image Processing and Recognition, Academic Press, 1979. ISBN 0-12-318850-4.
3 Masters, Timothy, Signal and Image Processing with Neural Networks, John Wiley & Sons, 1993. ISBN 0-471-04963-8
QUESTIONS? 
In the United States
(800) 243-3955
Outside United States:
(203) 238-2351

