Telecine? Interlace? WTF?

This page is intended to give an overview of different types of field-based video content and how to approach preparing that video content for viewing on a typical modern display. This page will not necessarily cover every possible scenario that may be encountered, but should equip readers with a clearer understanding of the underlying concepts.

Some terms may have slightly different technical definitions in different contexts, but they should have consistent meanings within this page and generally represent common usage.

What is a video? (Fields, Frames, and Framerate)

A video is just a series of pictures displayed in sequence at specific intervals. Each picture that is displayed is a frame, and how quickly these pictures are displayed is known as the framerate. The framerate of a video is usually measured in frames per second (fps).

If you go out and buy a DVD, put in your computer, and play the video in a software player (MPC-HC, mpv, VLC, etc) you will be presented with a series of frames. Each frame may or may not be made up of fields and that's what distinguishes frame-based and field-based content.

Frame-based Content

Frame-based content is very simple. It is just video content where each frame is a complete picture on its own. There are no fields contained in the frame. Completely straightforward, "normal" video content that can be viewed easily on modern displays.

Frame-based content is also commonly known as progressive content. That term is a little nuanced, so it won't be explored or used in this page. But it may be used in other discussions of these topics, and often you can treat the terms as synonyms.

The goal of this page is to explain how to obtain frame-based content from field-based content.

Field-based Content

Field-based content is a little more complicated. Each frame of the video content contains two fields. Each field is part of a picture and has the same width as the frame which contains it, but has only half the height. Instead of being stored as two half-height pictures stacked on top of eachother, they are stored in alternating rows.

Because of this storage method the fields are referred to as either even and odd, or top and bottom, respectively and interchangably.

This storage method also makes the field-based nature of the video most apparent when there is high motion. Since the top and bottom fields won't be displaying the same point in time, the picture will appear to be combed. The presence of this artifact is a clear indication that a video is field-based.

An unfortunately common misconception is that the presence of combing artifacts always means that the video is interlaced. It is a major goal of this page to correct this misunderstanding. If you take only one thing from this page know that you should NOT always de-interlace a video to "fix" combing. Continue reading to understand why.

With the general understanding of fields and frames established, let's move on to what common types of field-based content are.

Interlaced Content

What is interlaced content?

Interlaced content is the most straightforward type of field-based content, but not always the most common. It is any field-based video content where each field stores a complete picture.

Why would video be interlaced?

An interlaced video represents twice as many points in time as a frame-based video of the same framerate, but only represents half of the vertical picture of an equivalent frame-based video at each point in time. Essentially the method trades the spatial resolution of a video for temporal resolution. That is to say that an interlaced video is able to capture smoother motion than the framerate would otherwise allow, but is worse at capturing the details of each frame.

What to do with an interlaced video?

De-interlacing is the process of turning the sequence of fields in a video into a new sequence of frames. In its simplest form this separates each field and then doubles their height. The result is an output frame-based video with the same width and height, but double the framerate of the field-based input video.

Telecined Content

What is telecined content?

Telecined content is a more complicated type of field-based content. It is field-based video content where each field represents only half of a complete picture, and which contains some duplicate fields. That's intentionally a little vague because there are several different ways a video could hypothetically be telecined.

To keep things more clear this page will focus only on what is likely the most common type of telecining which is known as a "2:3 pulldown" (or "3:2 pulldown"). This specific type of telecining means that for every group of 10 fields (5 field-based frames) there are 4 unique pictures (4 frame-based frames). This grouping of 10 fields is referred to as a cycle. It contains an integer number (5) of field-based frames and integer multiple (2) of the pulldown's length (2 + 3 = 5).

The #:# naming convention denotes how many output fields of an input frame occur in a row. So if we consider input frames A B C D then a single cycle of a 2:3 pulldown would have 2 fields from A, then 3 fields from B, 2 fields from C, and 3 fields from D; A A B B B C C D D D.

Why would video be telecined?

The benefit is less obvious here, but ultimately comes down to differences between production and distribution formats. If a company produces a video at 24fps and wants to distribute it on TV or home video which only support 30fps then they need some method of converting the content, and telecining is the most common method.

There is no benefit to telecining a video if its original framerate is supported by its distribution channel. For example there is basically never a need to telecine a video for an internet streaming platform as they support any common framerate. (Though content originally produced for home video that is now hosted on a streaming service may still be telecined)

What to do with telecined video?

Inverse Telecine (aka IVTC) is the process of undoing a telecine to obtain a frame-based video. There are some nuances to different methods, but broadly this consists of two logical operations which may happen separately or in one pass: field matching and decimation. We will examine these two operations separately first.

If we consider a frame-based video A B C D that has been telecined as

A  B  B' C  D
A B C D D'

we can arrive at our original frames by considering each top field and matching it with a bottom field. For example

A  B  B' C  D
| | / / |
A B C D D'

Which arrives at

A B B C D

Note that we could have selected either bottom field to match with the top field of D. If the telecine was performed corectly then it doesn't matter which one is used.

Alternatively we could arrive at the original frames by considering each bottom field and matching it with a top field. For example

A  B  B' C  D
| | / /|
A B C D D'

Which arrives at

A B C D D

And in this case we could have selected either top field to match with the bottom field of B. Again it shouldn't matter in the ideal case.

Whether we match by top or bottom doesn't really matter, we still have all the original input frames present with one duplicate. The only difference is whether it's a duplicate of B or D.

Decimation then simply drops the duplicate frame to arrive at our original A B C D.

Automated IVTC will generally perform these two steps separately and will use heuristics for both steps. How well those heuristics perform determines how well the content is IVTC'd.

Generally the heuristic for field matching is how "combed" a possible match is. If you match fields from different frames you expect the result to be more combed than if you matched fields from the same frame.

For decimation the heuristic is generally similarity to a neighboring frame. Ideally there should be one pair that matches exactly, but for various reasons there might not be an exact match. Regardless dropping one frame from the pair that is closest will restore the correct frame count.

However these heuristics are not infailable, and consider if the wrong field is matched. In this case you may complete field matching with a combed frame, for example A B BC C D. If you decimate the most similar frames C and D might be closer than B or C is to the combed BC. So the ultimate output may be A B BC D which now contains a combed frame and may look "jerky" or "juddery" in motion because C is half missing.

Manual IVTC then allows you to make field matching and decimation decisions to correct situations where the heuristics failed. You can still keep the steps separated when you manually IVTC, but there's little benefit to a human operator in keeping the steps separated. Instead you can consider performing both operations simultaneously. For example

A  B  B' C  D
| | / |
A B C D D'

Produces A B C D directly.