Exploring Films with Computer Vision

Movie Scene Sensing

MOVIE SCENE SENSING IN USE

We tested Movie Scene Sensing with Quentin Tarantino's Kill Bill Vol. 1 (2003), known for its fusion of visual styles and nonlinear narrative. The tool processed 1,945 keyframes, extracted during visual and compositional transitions, capturing the film's most representative moments.

Exploring Kill Bill Vol. 1 with Computer Vision

IDENTIFYING CLUSTERS

These keyframes were organized into 27 clusters, highlighting the film's aesthetic, narrative, and thematic patterns.

The clusters were classified into three main categories: (1) Tarantino's characteristic framing and camera styles; (2) visual and thematic references linked to genres like spaghetti westerns, anime, and martial arts cinema; and (3) character-centered narrative arcs that structure the story. This segmentation reflects the film's aesthetic and narrative choices, captured by the tool, and can be applied to analyze other works with similarly complex styles.

CAMERA STYLES: TARANTINO'S VISUAL SIGNATURES

By examining clusters grouped by compositional similarity, we can identify recurring camera styles and framings that reflect Tarantino's visual style and cinematic influences.

Feet and Shoes

This cluster gathers scenes showcasing Tarantino's visual obsession with feet, such as Beatrix attempting to regain movement after her coma, footsteps during tense moments (like Budd's arrival at Two Pines Chapel), battles with O-Ren Ishii, or the hospital escape in the Pussy Wagon.

Extreme Eye Close-ups

This cluster includes intense close-ups of eyes, inspired by Sergio Leone's spaghetti western duels and anime aesthetics. These framings appear in key confrontations, such as between Beatrix and O-Ren Ishii, the fight with the Crazy 88, Gogo Yubari, and the animated flashbacks narrating O-Ren's origin.

AESTHETIC REFERENCES: FUSION OF GENRES AND STYLES

The tool also identified clusters capturing Tarantino's explicit stylistic references, including cinematic genres and specific color choices.

Yellow Bruce Lee

This cluster features frames highlighting Beatrix's yellow outfit, a direct tribute to Bruce Lee's attire in Game of Death (1978).

Anime

MSS clearly separated the animated sequence narrating O-Ren Ishii's origin, emphasizing the stylistic contrast with the rest of the film. This choice reflects the influence of animes like Ninja Scroll and Ghost in the Shell. The model recognized how this visual language transition alters the narrative tone without breaking the film's cohesion.

Close-ups in Yellow and Red

Movie Scene Sensing also captured the vibrant palette reminiscent of spaghetti westerns and samurai films. The juxtaposition of yellow and red is used in action moments to intensify the sense of movement, battle, and danger, a recurring element in the work of directors like Akira Kurosawa, whose films directly influenced Tarantino's aesthetics.

The third category of clusters reflects the film's narrative organization, centered on revenge arcs connected to the main characters. The tool captured key moments of each arc, allowing exploration of how they unfold visually and narratively.

The tool was able to visually separate scenes by identifying recurring character presences within the same environment. This demonstrates how the model detects stylistic variations without losing focus on central elements of the scenes.

MSS identified frames from Beatrix's first confrontation with Vernita Green, which takes place in a suburban house. The juxtaposition of domestic elements and violence highlights the aesthetic contrast with other revenge settings in Volume 1.

Arcs of Hattori Hanzo and Budd were located in the Two Pines Chapel massacre, as well as subgroups of characters associated with the film's main arc—the House of Blue Leaves—identifying battle scenes featuring specific protagonists like the Crazy 88, Gogo Yubari, and the duel with O-Ren Ishii.

CHARACTERS AND NARRATIVE ARCHS

CHRONOLOGICAL VISUALIZATION

Movie Scene Sensing also enables the chronological visualization of clusters, maintaining the original narrative sequence of the film. This functionality allows users to observe how mapped groups—initially arranged by thematic or compositional similarity—connect throughout the narrative progression, providing an overview of the relationship between identified moments and the structure of the work.

Mapping Cluster Distribution in Chronological Order

Beyond facilitating the analysis of thematic cluster relationships, the chronological visualization helps identify recurring or evolving patterns throughout the narrative.

Applying this module of MSS to Kill Bill Vol. 1 revealed how Tarantino organizes stylistic and thematic elements to connect arcs and create transitions within the film’s nonlinear structure.

For filmmakers and editors, this feature serves as a resource for evaluating the impact of editing choices. For researchers, it provides an exploratory tool for studying the logic behind the structure of complex films.

ORDERING CLUSTERS TEMPORALLY

MAPPING NARRATIVE PATTERNS

The thematic mapping module of Movie Scene Sensing offers a visualization strategy dedicated to exploring similarities and emerging patterns that might be harder to identify through a chronological reading of frames.

Movie Scene Sensing leverages the features of keyframes within clusters identified by computer vision models to map them into a two-dimensional space. This approach allows for an intuitive exploration of how specific compositional patterns and narrative axes in Kill Bill interact, independent of temporal constraints.

Exploring Visual Narratives Beyond Time Linearity

IDENTIFYING ARCS, STYLES AND THEMES IN THE NARRATIVE

The arrangement of clusters on the map reveals the interactions that shape the film's narrative and visual aesthetic. Sequences connected by themes or visual styles appear close together, while contrasting patterns occupy more distant or marginal positions.

This spatial arrangement shows how Tarantino builds connections between seemingly disconnected moments in Kill Bill Vol. 1.

The clusters from the initial arcs—Hospital Breakout and Vernita Green—are positioned close to each other on the map, reflecting stylistic similarities identified by the tool. Although they belong to distinct narrative segments, the grouping highlights shared visual characteristics, such as lighting tones and framing.

The Hospital Breakout cluster includes scenes from the hospital corridor, the interior of the Pussy Wagon, and Beatrix's struggle to regain movement. In contrast, the Vernita Green cluster groups frames from Beatrix's first confrontation in a domestic setting, characterized by visual elements that stand out from other locations in the film, such as the House of Blue Leaves.

NARRATIVE ARCS I: THE FIRST STEPS OF REVENGE

Movie Scene Sensing demonstrated its ability to identify not only visual patterns but also stylistic and narrative connections, organizing diverse moments coherently and highlighting their significance within the film’s structure.

The map reveals that the House of Blue Leaves arc occupies a central position, reflecting its importance as the narrative and stylistic core of Kill Bill Vol. 1.

The tool also identified subclusters within this sequence, mapping not only visual similarities but also narrative and stylistic relationships that characterize Tarantino’s work.

At the lower edge of the map, the Anime Sequence, which narrates O-Ren Ishii's origin, was highlighted as a stylistic rupture due to its distinct visual language, while still connected to the central arc.

On the opposite side, the subcluster of the battle with the Crazy 88, O-Ren's elite group, was segmented by its use of vibrant colors and the transition to black and white.

The Gogo Yubari subcluster, representing O-Ren’s bodyguard, captured close-ups of her weapon and expressive gazes. Meanwhile, the final duel between Beatrix and O-Ren stands slightly apart from the others, characterized by the predominance of wide shots and minimalist cinematography. These subclusters converge at the center of the map, consolidating the narrative axis of O-Ren Ishii.

NARRATIVE ARCS II: THE CENTRALITY OF THE HOUSE OF BLUE LEAVES

CLOSE-UPS AND CONECTIONS

Clusters related to visual styles, such as Feet Close-ups, Extreme Eye Close-ups, and Object Close-ups, appear near the center of the map generated by the tool. This indicates recurring visual characteristics captured by the models, suggesting the prominence of these compositional choices in the visual construction of Kill Bill Vol. 1.

The Feet Close-ups cluster connects scenes like Beatrix’s recovery in the hospital and strategic movements in the House of Blue Leaves, highlighting how this framing recurs during moments of preparation or movement.

The Extreme Eye Close-ups, on the other hand, appear during high-tension transitions, such as duels with O-Ren and Gogo Yubari, as well as flashbacks that emphasize the emotional weight of the narrative.

27 CLUSTERS

1945 KEYFRAMES

CLIPES AND AUDIOVISUAL MIXING

The clip production module in Movie Scene Sensing expands both analytical and creative possibilities by exporting the keyframes of each cluster as MPEG micro clips, retrieving the original film scenes associated with those frames.

This allows researchers to visually explore stylistic and thematic patterns in isolation within a continuous sequence representing the entire cluster.

The Mix mode enables the combination of clusters, creating clips that explore connections between different groups. This functionality allows for generating thematic or stylistic combinations, enabling visual comparisons between groups or creating new interpretations based on detected patterns.

The clips can be used for detailed analyses or as material for creative experiments, offering practical ways to study or reinterpret the sequences identified by the tool.

Creating Clips and Combining Clusters

TRANSFORMING CLUSTERS INTO CLIPS

Movie Scene Sensing includes a module for transforming clusters into clips. Using the grouped keyframes, the tool can generate videos that synthesize each cluster’s visual characteristics, such as framing, colors, or compositional patterns.

MIXING CLUSTERS

The mixing module takes this functionality further, allowing clusters to be combined in a customizable way to create hybrid clips that synthesize multiple thematic or visual axes of the film. This functionality operates in two modes: chronological and hierarchical.

Chronological Mode

In chronological mode, keyframes from different clusters are organized according to their appearance in the film, resulting in an edit that respects the original narrative progression.

Hierarchical Mode

In hierarchical mode, the order of clusters is defined by the user, enabling the creation of sequences that prioritize specific themes or styles, reorganizing the film’s elements according to a customized logic.

In the example provided, we used the Feet and Shoes clusters, which appear at different narrative moments in the film, to create a mix that connects these scenes into a single continuous sequence. The resulting montage brings together moments where the focus on feet is used as a narrative element.

AVAILABILITY AND ACCESS

Currently, Movie Scene Sensing is under development and limited to ongoing research within the laboratory. It is not yet available for general use or broad distribution.

Collaborations may be considered for specific projects aligned with the lab's research agenda and team availability. Proposals for collaborations or partnerships will be reviewed based on their potential contribution to ongoing investigations.

If you are interested in exploring collaboration or investment opportunities, please contact the group leader for preliminary discussions: eliasbitencourt@gmail.com.

Using Movie Scene Sensing in Research