Presented at the 39th Annual Meeting of the Institute of Nuclear Materials Management, July 1998 |
M. Ondrik, S. Kadner, S. Kraus, V. Thompson
CANBERRA
8401 Washington Place, N.E.
Albuquerque, NM 87113
Tel: (505) 828-9100
Fax: (505) 828-9115
email: vthompson@canberra-abq.com
W. Horak, A. Reisman
Brookhaven National Laboratory
P.O. Box 5000
Upton, NY 11973
Tel: (516) 344-2627
Fax: (516) 344-5344
A. Nadezhdinskii, S. Rudov, L. Medvedev, M. Spiridonov
General Physics Institute
Vavilova 38
Moscow, Russia 117942
Abstract
Ongoing collaboration in the area of Safeguards surveillance between CANBERRA, the General Physics Institute (GPI), and the International
Safeguards Project Office at Brookhaven National Laboratory (BNL) have achieved
significant advances in front-end motion detection capabilities; also known
as "Smart Video" for digital Safeguards systems. While research and
refinement are expected to continue until the year 2000, this project has seen
major progress throughout the past year. The algorithms and software for benchmarking
single and multi-camera systems have been developed, and is in the process of
refinement. In fact, it has been determined that a single computer is not sufficient
for the multi-tasking processes of grabbing images, digitizing the camera signal,
calculating and displaying results, etc. Hence, a Master-slave network has been
developed which speeds up the process. This paper will demonstrate the ability
of Smart Video to automatically identify and recognize the movement of sensitive
items within the camera’s field of view.
Introduction
The concept of "Smart Video" originated with the Arms Control Treaty
Verification program. However, experiences with remote monitoring indicate that
this technology offers significant benefits to Safeguards. Consequently, a small
international research and development activity was initiated with the goal
of refining the concepts and establishing the feasibility of using Smart Video
in the Safeguards regime. The funding and program management is provided by
the International Safeguards Project Office at Brookhaven National Laboratory;
technical direction and application analysis is provided by CANBERRA; and the actual development of the smart video prototype is being performed
by researchers at the General Physics Institute in Moscow.
For many years there have been efforts to develop technologies that can reduce
the volume of data collected in Safeguards surveillance scenarios. Front-end
motion detection (FEMD) has been under development for Safeguards use for quite
some time. Within the past few years FEMD technology has matured to the point
that it has recently been deployed for large-scale remote monitoring field trials.
Smart Video has its roots in basic scene-change detection that is the basis
of FEMD.
The intent of Smart Video is that without human direction or intervention the
camera system will detect movement of objects, identify any "objects of
interest", and record their trajectory. Smart Video can dramatically reduce
the volume of data collected in an unattended or remote monitoring scenario
by filtering out motion alarms that do not involve specific objects of interest.
We anticipate that the ability to recognize specific objects and record their
movements will have immediately beneficial application to Safeguards of activities
such as fuel transfers, materials movements, refueling, and other similar activities
that are so complex that they currently either require an inspector to be physically
present, or the monitoring systems generate huge amounts of data, much of it
redundant, that must be reviewed by the inspector.
Background
Today’s motion detecting cameras all use variations of the same method
for detecting scene changes. Conceptually, each successive picture is compared
to a reference image pixel by pixel. If the number of pixels that are different
exceeds some threshold, the picture is declared to be different from the reference
image. In practice it is necessary to considerably preprocess the pictures to
suppress random noise from CCD dark currents and other sources. A simplified
conceptual representation of the FEMD process is:
An alarm is declared when
where d is the alarm threshold and where

is the pixel by pixel difference between the current picture and the reference
image, the factor r j is a noise suppression term, and e is the difference sensitivity
threshold.
There are many different variations of this process that are actually implemented.
For example, the discrete cosine compression coefficients can be compared on
a block by block basis. In any case, it is assumed that the change in the pixel
values is due to motion in the camera’s field of view. However, motion
is not the only thing that can cause sufficient differences to generate an alarm.
Fluctuations in lighting, camera flicker, glint and other phenomena contribute
to a large number of false alarms. Furthermore, the camera detects and records
all scene changes; most of which are incidental to the actual surveillance and
have no Safeguards significance. Despite these limitations, current FEMD technology
has dramatically reduced the volume of data that must be collected and reviewed
compared to traditional time-lapse surveillance methods. Smart Video seeks to
reduce both false alarms and incidental alarms by extending the FEMD process
to consider changes to the geometry of the field of view; and then use the geometry
information to classify and identify objects in motion.
Most of the historical work to detect and categorize objects was primarily
directed towards satellite imagery. For example, detecting, identifying, and
counting the number of airplanes on a military base. The techniques developed
for this application are dependent on certain boundary conditions that are obtained
in this scenario; such as fixed vertical aspect, minimal motion, etc. There
was no attempt to actually detect and track motion in the Safeguards sense,
and the orientation of the "objects of interest" was relatively fixed
and constant. Much of the more recent work has been based on texture-mapping
in color-space; which is unavailable in Safeguards imagery. The vastly different
boundary conditions that are obtained in the Safeguards regime required that
new methods for detecting and characterizing objects be developed.
Technical Approach
There is no guarantee that any object of interest will remain oriented in any
particular way with respect to a camera’s field of view; that is, the shape
of an object as seen by the camera cannot be known a priori. For example, a
cylindrical materials container appears to be a rectangle when viewed from the
front and as a circle when viewed from the top. In order to identify the object
in an arbitrary orientation, it is necessary to characterize the field of view
and the object in three dimensions (3-D). Therefore it is known a priori that
the volume of interest must be monitored by at least three cameras that are
not coplanar with the objects under surveillance.

Figure 1
To simplify the interactions of multiple cameras, the approach was to segment
the workload so that each of the cameras could preprocess its own field of view
and the results from all cameras combined in a central server to create the
3-D image models (Figure 1). This segmentation approach allowed early efforts
to focus on developing the pre-processing algorithms that must be performed
in each camera’s local 2-D coordinate system. Results from all cameras
are combined at a server which translates the results into a common 3-D reference
frame. It is only in the 3-D coordinate system of the server that it is possible
to estimate the size, shape, moments, and motion of the object in order to determine
whether it represents one of the cataloged "objects of interest."
Work to date has concentrated on ensuring that each camera is able to isolate
any moving object within its field of view while rejecting scene changes due
to fluctuations in lighting. This is done by tracing movement of the centroid
(in the camera frame of reference) of the pixels that represent the changes
from scene to scene in a sequence of pictures (Figure 2).

Figure 2
Designating the picture taken at time n as Bn, then
the pixels in that picture are designated as Bn(xi,yj).
As with the basic FEMD processes, the thresholded difference image, Dn,
at time n is then constructed by subtracting successive pictures pixel
by pixel as follows:

At this point the FEMD process is extended to estimate the geometry of the
"blob" of remaining pixels in the difference image. The projections
of the difference image along the X and Y axes are computed in the normal way:

The average value of the axis projections is used to locate the centroid of
the pixel cluster:

The centroid provides a rough estimate of the center of the moving object.
Finally, the standard deviations in both axes are computed:

The standard deviations provide an estimate of the size of the moving object
around the centroid.
This process, using only a single camera provides an estimate of position and
size of an object, and with a sequence of pictures provides an estimate of the
speed and direction of motion. All of this information by itself is very useful
and difficult to obtain in many Safeguards scenarios. Despite the utility of
the single camera process, there are still several shortcomings with respect
to the objective of recognizing objects in motion. These are the subject of
the work that has been accomplished in the past few months.
Recent Progress
Previous work demonstrated the feasibility of the approach, but was restricted
to a single camera and was naturally somewhat limited in scope. Recent work
has expanded the analysis to include 3-D reconstructions of the objects in the
field of view and address several other restrictions. In particular:
- Implementation was restricted to the horizontal plane; thus unable to trace
vertical movements.
- The previous methods were able to track only a single object at a time.
- Estimate of the object’s size was relative to the camera reference
frame rather than along the object’s principal axes of inertia.
Because the identification of the moving objects is strongly connected to its
generalized geometric characteristics, it follows that the estimate of size
along the principal axes of inertia is the most critical enabling element of
the technology. Therefore we use this particular issue as an example of the
progress made to date.
The approach is to have each camera process its frames
separately to determine the principal central axis and object sizes
along this axis. This is accomplished in the camera by determining the
two eigenvectors of the symmetrical 2x2 submatrix of the moment of inertia
tensor. These values, representing the position vectors of the ends
of two segments lying in a horizontal plane located at the centroid
of the object, are sent to the master server where they are transformed,
using the direction cosines of the camera placement geometry, into a
common 3-D coordinate system. The problem is that the three pairs of
such straight lines do not intersect in one point, but have intersections
near the spatial center of the object. The server selects the one segment
from each pair of segments which most probably corresponds to a true
horizontal segment in the object. Using the fact that the longest segment
should be the actual length, width, diagonal, or height of the object,
the server estimates the size of the object based on maximum-likelihood
collections of three selected segments. The process is represented as
follows:
Assume a common coordinate system - XYZ, and three local camera coordinate
systems - x1y1z1, ..., x3y3z3.
Then the geometric center of an object defined on each coordinate frame can
also be defined on the common frame using standard transforms. The projections
in the camera frames can then be used to compute the components of the tensor
of inertia, J, relative to the geometric center:

Where Xi = (xi-x0) and Zj = (zj-z0)
are the distances from the given point to the centroid along the appropriate
axes of the matrix. The principal axes of the object are found from the eigenvectors
of the matrix. Denoting the elements in the standard way, i.e. Jxx,
-Jxz = -Jzx, and Jzz, we can express the characteristic
polynomial as:

and because the submatrix is symmetric with real components, its characteristic
polynomial has two real roots, l x =l h . The unit eigenvectors x ( x x,
x z) and h ( h x, h z) are found from the system:

The eigenvectors x and h set the directions of the principal central axes while
the eigenvalues are the moments of inertia about the principal central axes.
That is, the eigenvalues represent the average deviations of the projection
of intensity on these axes, which is what allows them to be used to estimate
the size of the object. Given the principal axes and the moments about the principal
axes, the entire set of systems can be translated to the common coordinate system
of the server using the direction cosines to the camera coordinate systems.
Within the common reference frame the object can be reconstructed in 3-space
and the length, width, and height can be estimated in the natural coordinates
of the principal axes. These values can then index into a table of characteristics
of objects of interest to assign an identity to the detected object.
Results
The methods developed to date are restricted to motion in a plane, but, quite
significantly, have demonstrated their ability to identify and discriminate
between moving cars and bicycles; although not at the same time. It has also
been found that the frame processing rate must be sufficiently fast that linearizations
from variational methods apply with sufficient accuracy to reduce errors in
estimated trajectories. On the other hand, if the frame rate is too high the
changes from scene to scene are so small that they cannot be reliably extracted
from the normal background of pixel noise. Experimental evidence indicates that
a rate of 5 to 8 frames per second provides the best overall performance. It
is noted that this rate is an order of magnitude faster than current Safeguards
cameras are capable of. Work still continues to permit tracking and identification
of multiple objects simultaneously, to extend the method to track objects in
vertical motion, and to refine the criteria for distinguishing and identifying
similar objects.
CONCLUSIONS
Significant progress has been made toward the goal of
automatically detecting, identifying, and tracking objects of interest
in a surveillance scenario. A significant amount of work remains to
be completed to achieve this goal, and Smart Video is not expected to
emerge as a mature technology until the turn of the century. Nonetheless,
the results to date already offer potential benefits in today’s
remote monitoring Safeguards environment. The demonstrated ability to
determine that a scene change was not due to lighting fluctuations,
but was indeed due to the movement of a discrete object in the field
of view has the potential to dramatically reduce the false alarm rate.
In specific complex but well defined Safeguards scenarios it may be
possible to reliably discriminate between a small, fast, vertical object
(human walking) and a large, slow, horizontal object (fuel assembly).
This will enable the system to filter out incidental motion alarms that
contain no Safeguards significant information. Both of these capabilities
can reduce the volume of data to be collected and reviewed; and may
offer a means of reducing the amount of time an inspector must be on
site during activities such as refueling. Clearly there are significant
benefits that can be derived immediately from the progress to date in
the development of Smart Video. The limiting factor in deployment of
these techniques is the frame-rate of currently available authenticating
cameras. It is expected, however, that this limitation will be somewhat
easier and faster to remove than was the development of the processing
technology.
1 Wilson, Grahame, IAEA Optical Surveillance
Discussion Paper, May 1993, Rev 2.0 93/DID-05/GW.
2 Hall, Ernest L., Computer Image
Processing and Recognition, Academic Press, 1979. ISBN 0-12-318850-4.
3 Masters, Timothy, Signal and Image
Processing with Neural Networks, John Wiley & Sons, 1993.
ISBN 0-471-04963-8