Motion capture, Motion Tracking or Mocap, is a technique of digitally recording movements for entertainment, sports and medical applications.
Motion tracking or motion capture started as an analysis tool in biomechanics research, and expanded into education, training, sports and recently computer animation for cinema and video games as the technology has matured. A performer wears markers near each joint to identify the motion by the positions or angles between the markers. Acoustic, inertial, LED, magnetic or reflective markers, or combinations of any of these, are tracked, optimally at least two times the rate of the desired motion, to sub millimeter positions. The motion capture computer software records the positions, angles, velocities, accelerations and impulses, providing an accurate digital representation of the motion.
In entertainment applications this can reduce the costs of animation which otherwise requires the animator to draw each frame, or with more sophisticated software, key frames which are interpolated by the software. Motion capture saves time and creates more natural movements than manual animation, but is limited to motions that are anatomically possible. Some applications might require additional impossible movements like animated super hero martial arts or stretching and squishing that aren't possible with real actors.
In biomechanics, sports and training, real time data can provide the necessary information to diagnose problems or suggest ways to improve performance, driving ever faster motion capture technology.
Optical systems triangulate the 3D position of a marker between a number of cameras calibrated to provide overlapping projections. Tracking a large number of markers or multiple performers or expanding the capture area is accomplished by the addition of more cameras. These systems produce data with 3 degrees of freedom for each marker, and rotational information must be inferred from the relative orientation of three or more markers; for instance shoulder, elbow and wrist markers providing the angle of the elbow. A newer technique discussed below uses higher resolution linear detectors to derive the one dimensional positions, requiring more sensors and more computations, but providing higher resolutions (sub millimeter down to 10 micrometers time averaged) and speeds than possible using area arrays.
Passive optical systems use reflective markers illuminated from strobes on the camera and triangulate each marker from its relative location on a 2D map. Data can be cleaned up with the aid of kinematic constraints and predictive gap filling algorithms. Passive systems typically use sensors such as Micron’s where the camera captures an image of the scene, reduces it to bright spots and finds the centroid. These 1.3 megapixel sensors can run at frame rates up to 60,000,000 pixels per second divided by the resolution, so at 1.3 megapixels they can operate at 500 frames per second.
Micron's 4 megapixel sensor costs about $1,000 and can run at 640,000,000 pixels per second divided by the applied resolution. By decreasing the resolution down to 640 x 480, these cameras can sample at 2,000 frames per second, but then trade off spatial resolution for temporal resolution. At full resolution they run about 166 frames per second, but typically are run at 100 to 120 frames per second. With about 200 LED strobes synchronized to the CMOS sensor, the ease of combining a hundred dollars worth of LED’s to a $1,000 sensor has made these systems very popular.
Professional vendors have sophisticated software to reduce problems from marker swapping since all markers appear identical. Unlike active marker systems and magnetic systems, passive systems do not require the user to wear wires or electronic equipment. Passive markers are usually spheres or hemispheres made of plastic or foam 25 to 3mm in diameter with special retroreflective tape. This type of system can capture large numbers of markers at frame rates as high as 2000fps and high 3D accuracy. Manufacturers of this type of system include Vicon-Peak, Simi, Motion Analysis and BTS.
Active marker systems have an advantage over passive in that there is no doubt about which marker is which. In general, the overall update rate drops as the marker count increases; 5000 frames per second divided by 100 markers would provide updates of 50 hertz. As a result, these systems are popular in the biomechanics market. Two such active marker systems are Optotrak by Northern Digital and the Visualeyez system by PhoeniX Technologies Inc.
Higher resolution active marker systems such as PhaseSpace show more subtle movements by providing marker IDs in real time, modulating the output of the LED to differentiate each marker, allowing 32 markers to be on at the same time, eliminating marker swapping and providing much cleaner data than older technologies. Smart LEDs allow motion capture outdoors in direct sunlight, while providing the 3,600 × 3,600 or 12 megapixel resolution while capturing at 480 frames per second. The advantage of using active markers is intelligent processing allows higher speed and higher resolution of optical systems at a lower price. This higher accuracy and resolution requires more processing than older passive technologies, but the additional processing is done at the camera to improve resolution via a subpixel or centroid processing, providing both high resolution and high speed. By using newer processing and technology, these motion capture systems are about 1/3 the cost of older systems.
Magnetic systems, calculate position and orientation by the relative magnetic flux of three orthogonal coils on both the transmitter and each receiver. The relative intensity of the voltage or current of the three coils allows these systems to calculate both range and orientation by meticulously mapping the tracking volume. Since the sensor output is 6DOF, useful results can be obtained with two-thirds the number of markers required in optical systems; one on upper arm and one on lower arm for elbow position and angle. The markers are not occluded by nonmetallic objects but are susceptible to magnetic and electrical interference from metal objects in the environment, like rebar (steel reinforcing bars in concrete) or wiring, which affect the magnetic field, and electrical sources such as monitors, lights, cables and computers. The sensor response is nonlinear, epecially toward edges of the capture area. The wiring from the sensors tends to preclude extreme performance movements. The capture volumes for magnetic systems are dramatically smaller than they are for optical systems. With the magnetic systems, there is a distinction between “AC” and “DC” systems: one uses square pulses, the other uses sine wave pulses. Two magnetic systems are Ascension technology and Polhemus.
Mechanical motion capture systems directly track body joint angles and are often referred to as exo-skeleton motion capture systems, due to the way the sensors are attached to the body. A performer attaches the skeletal-like structure to their body and as they move so do the articulated mechanical parts, measuring the performer’s relative motion. Mechanical motion capture systems are real-time, relatively low-cost, free-of-occlusion, and wireless (untethered) systems that have unlimited capture volume. Typically, they are rigid structures of jointed, straight metal or plastic rods linked together with potentiometers that articulate at the joints of the body. However, a newer and more flexible take on exo-skeleton motion capture systems is the ShapeWrap II system by Measurand Inc.ShapeWrap II offers a mocap system based on ShapeTapes that flex. By conforming to limbs instead of following rigid paths, ShapeWrap II moves with the body to capture fine details of shape on a wide variety of body types.
In the motion capture session, the movements of one or more actors are sampled many times per second. High resolution optical motion capture systems can be used to sample body, facial and finger movement at the same time.
A motion capture session records only the movements of the actor, not his visual appearance. These movements are recorded as animation data which are mapped to a 3D model (human, giant robot, etc.) created by a computer artist, to move the model the same way. This is comparable to the older technique of rotoscope where the visual appearance of the motion of an actor was filmed, then the film used as a guide for the frame by frame motion of a hand-drawn animated character.
If desired, a camera can pan, tilt, or dolly around the stage while the actor is performing and the motion capture system can capture the camera and props as well. This allows the computer generated characters, images and sets, to have the same perspective as the video images from the camera. A computer processes the data and displays the movements of the actor, as inferred from the 3D position of each marker. If desired, a virtual or real camera can be tracked as well, providing the desired camera positions in terms of objects in the set.
A related technique match moving can derive 3D camera movement from a single 2D image sequence without the use of photogrammetry, but is often ambiguous below centimeter resolution, due to the inability to distinguish pose and scale characteristics from a single vantage point. One might extrapolate that future technology might include full-frame imaging from many camera angles to record the exact position of every part of the actor’s body, clothing, and hair for the entire duration of the session, resulting in a higher resolution of detail than is possible today.
After processing, the software exports animation data, which computer animators can associate with a 3D model and then manipulate using normal computer animation software such as Maya or 3D Studio Max. If the actor’s performance was good and the software processing was accurate, this manipulation is limited to placing the actor in the scene that the animator has created and controlling the 3D model’s interaction with objects.
Mocap offers several advantages over traditional computer animation of a 3D model:
- Mocap can take far fewer man-hours of work to animate a character. One actor working for a day (and then technical staff working for many days afterwards to clean up the mocap data) can create a great deal of animation that would have taken months for traditional animators.
- Mocap can capture secondary animation that traditional animators might not have had the skill, vision, or time to create. For example, a slight movement of the hip by the actor might cause his head to twist slightly. This nuance might be understood by a traditional animator but be too time consuming and difficult to accurately represent, but it is captured accurately by mocap, which is why mocap animation often seems shockingly realistic compared with hand animated models. Incidentally, one of the hallmarks of rotoscope in traditional animation is just such secondary “business.”
- Mocap can accurately capture difficult-to-model physical movement. For example, if the mocap actor does a backflip while holding nunchaku by the chain, both sticks of the nunchucks will be captured by the cameras moving in a realistic fashion. A traditional animator might not be able to physically simulate the movement of the sticks adequately due to other motions by the actor. Secondary motion such as the ripple of a body as an actor is punched or is punching requires higher speed and higher resolution as well as more markers.
On the negative side, mocap data requires special programs and time to manipulate once captured and processed, and if the data is wrong, it is often easier to throw it away and reshoot the scene rather than trying to manipulate the data. Many systems allow real time viewing of the data to decide if the take needs to be redone.
Another important point is that while it is common and comparatively easy to mocap a human actor in order to animate a biped model, applying motion capture to animals like horses can be difficult.
Motion capture equipment costs tens of thousands of dollars for the digital video cameras, lights, software, and staff to run a mocap studio, and this technology investment can become obsolete every few years as better software and techniques are invented. Some large movie studios and video game publishers have established their own dedicated mocap studios, but most mocap work is contracted to individual companies that specialize in mocap.
Video games use motion capture for football, baseball and basketball players or the combat moves of a martial artist.
Movies use motion capture for CG effects, in some cases replacing traditional cell animation, and for completely computer-generated creatures, such as Gollum, Jar-Jar Binks, and King Kong, in live-action movies.
Virtual Reality and Augmented Reality require real time input of the user’s position and interaction with their environment, requiring more precision and speed than older motion capture systems could provide. Noise and errors from low resolution or low speed systems, and overly smoothed and filtered data with long latency contribute to “simulator sickness” where the lag and mismatch between visual and vistibular cues and computer generated images caused nausea and discomfort.
High speed—high resolution active marker systems can provide smooth data at low latency, allowing real time visualization in virtual and augmented reality systems. The remaining challenge that is almost possible with powerful graphic cards is mapping the images correctly to the real perspectives to prevent image mismatch.
Motion capture technology is frequently used in digital puppetry systems to aid in the performance of computer generated characters in real-time.
Facial motion capture is utilized to record the complex movements in a human face, especially while speaking with emotion. This is generally performed with an optical setup using multiple cameras arranged in a hemisphere at close range, with small markers glued or taped to the actor’s face.
Performance capture is a further development of these techniques, where both body motions and facial movements are recorded. This technique was used in making of The Polar Express, where all actors were animated this way.
Inertial systems use devices such as accelerometers or gyroscopes to measure positions and angles. They are often used in conjunction with other systems to provide updates and global reference, since they only measure relative changes, not absolute position.
RF (radio frequency) positioning systems are becoming more viable as higher frequency RF devices allow greater precision than older RF technologies. The speed of light is 30 centimeters per nanosecond (billionth of a second), so a 10 gigahertz (billion cycles per second) RF signal enables an accuracy of about 3 centimeters. By measuring amplitude to a quarter wavelength, it is possible to improve the resolution down to about 8 mm. To achieve the resolution of optical systems, frequencies of 50 gigahertz or higher are needed, which are almost as line of sight and as easy to block as optical systems. Multipath and re-radiation of the signal are likely to cause additional problems, but these technologies will be ideal for tracking larger volumes with reasonable accuracy, since the required resolution at 100 meter distances isn’t likely to be as high.
An alternative approach was developed by a Russian company VirtuSphere, where the actor is given an unlimited walking area through the use of a rotating sphere, similar to a hamster ball, which contains internal sensors recording the angular movements, removing the need for external cameras and other equipment. Even though this technology could potentially lead to much lower costs for mocap, the basic sphere is only capable of recording a single continuous direction. Additional sensors worn on the person would be needed to record anything more.