Skip to content

Video capture and processing ideas techniques

vanevery edited this page Aug 11, 2011 · 14 revisions

This page exists to list/discuss options for video capture and processing.

In order to capture video on Android devices we can either use the Camera class and save "preview" frames into a video file or we can use the MediaRecorder class and capture into a 3gp or mp4 file. The MediaRecorder class offers the best options for compression and format but doesn't offer any capabilities to modify the video as it is being captured. The Camera/preview method offers us the ability to modify and process the video as it is being captured but doesn't offer any advantages for compression or format.

Option 1: Camera Preview Images saved as Motion JPEG

Motion JPEG is a format used by many older still cameras (when in video capture mode) and commonly used by IP cameras to deliver video to a webpage.

Option 1A: IP Camera Stream

IP Cameras typically send a header followed by individual JPEG images with MIME headers in between via HTTP:

Benefits: It is quick and dirty. Perhaps perfect for a proof of concept demo. Allows processing of the video frames as they come in.

Drawbacks: This format, although playable by some software (VLC) is not universally playable nor sharable. It also doesn't have the means to support audio. Finally, it doesn't take advantage of modern codecs and the files can become very large.

Deal Breaker: This format doesn't have FPS control. The playback software will play back as fast as possible. It is really only meant for streaming the images as they are captured.

Example: https://github.com/vanevery/Android-MJPEG-Test

###Option 1B: MJPEG as MOV or AVI Still cameras typically save the Motion JPEG video to either a MOV or a AVI file. These can include sound and should be playable by a wide variety of software.

The following two links offer (Creative Commons 3.0 Attribution License) examples for writing these types of files in JSE (will require a fair amount of work to port to Android although the author seems to be willing to do it himself with the right bribe (wine/books)):

Here is the basis for the QuickTime version (It may be more straight forward, less JSE dependencies):

Benefits: It is a proper format that is sharable and close to universally playable. It allows for processing of video as it comes in and supports audio.

Drawbacks: It isn't taking advantage of modern video codecs and the files can become very large. It will take some work to port the above classes to Android.

##Option 2: Camera Preview Images sent to FFMPEG FFMPEG has been compiled on Android successfully. It may be used as either a library that we build some native code around or as a standalone command line app that we call to perform various operations.

The Bambuser sources don't contain many (any) additional libraries other than stock FFMPEG. In order to use MPEG-4/H.264/AAC we'll have to work on getting those to compile. Otherwise we can probably open/save video in stock formats/encodings (TBD).

###Option 2A: FFMPEG commandline This option would require saving individual frames and the audio track out onto the SD card and then passing them en-masse to FFPMEG for encoding/compiling into a video file of our choosing.

Benefits: It allows for processing of video as it comes in and supports audio. It is a proper format that is sharable and close to universally playable.

Drawbacks: The intermediate step requires significant amounts of disk space. I have never seen a command line usage of FFMPEG that does this so it may not even be possible ;-)

Example: https://github.com/vanevery/Android-MJPEG-Video-Capture-FFMPEG

Current Issues: We are losing a few compression generations in the way things are currently being processed.

###Option 2B: FFMPEG as library This option would require building a C/C++ application around FFMPEG that receives individual frames and audio and feeds them to video format of our choosing.

Benefits: It allows for processing of video as it comes in and supports audio. It is a proper format that is sharable and close to universally playable.

Drawbacks: We are not experts in FFMPEG and development of this app may require significant external help.

##Option 3: Capture video with MediaRecorder (possible live processing of preview frames or during playback) This option would leverage the built-in codecs and formats offering the highest quality video. The preview frames may be used for live face detection/processing saving only data (rather than the actual preview images) about the location of faces and so on. A post processing step would leverage FFMPEG or another library to decode the video, apply effects and then re-encode for saving/sharing.

With ffmpeg we could transcode a recorded video (full native format & framerate) and make our own filter (http://wiki.multimedia.cx/index.php?title=FFmpeg_filter_HOWTO) that does the redaction based on a second file that stores the redaction regions. It could also dump the redacted information to a separate stream, and with ffmpeg we might be able to embed this, encrypted into the container.

[2:12PM] vanevery: n8fr8, have a sec?
[2:12pm] n8fr8: sure
[2:13pm] vanevery: I have been playing with ffmpeg's libavfilter
[2:13pm] vanevery: it is pretty powerful and I think will do much of what we want
[2:14pm] vanevery: In the end, we'll have to build ourselves a filter that gives all of the options we need
[2:14pm] vanevery: (which is what asenior suggested)
[2:14pm] vanevery: For the prototype, here is what I am thinking
[2:14pm] n8fr8: ok
[2:14pm] vanevery: Since FFMPEG has a built-in overlay filter
[2:14pm] vanevery: that works well
[2:15pm] vanevery: but doesn't support moving or an easy way to do specific times
[2:15pm] vanevery: I think that we should create two videos
[2:15pm] vanevery: 1:  The actual normal video capture
[2:15pm] vanevery: 2:  A series of bitmaps that we overlay on the first
[2:16pm] vanevery: the series of bitmaps can be created at the same time we are capturing the video
[2:16pm] vanevery: they will have the regions where the fingers specify obscured
[2:16pm] vanevery: does that make a little sense?
[2:19pm] djhalliday: basically of the same size / shape / location as what the user is choosing at the time of capture?
[2:20pm] vanevery: Yes..  the second movie is that..  The regions and will be use to overlay on the first video
[2:21pm] n8fr8: and the seocnd movie can have an alpha channel?
[2:21pm] vanevery: Right
[2:21pm] vanevery: It can be a series of PNGs that FFMPEG makes into a movie
[2:21pm] vanevery: and then we overlay the two
[2:21pm] n8fr8: and if they want pixelization we can grab the pixels from the first move as well
[2:21pm] vanevery: right
[2:22pm] vanevery: (although more difficult)
[2:22pm] vanevery: The second doesn't actually have to be created in real time either, just at some point in the processing
[2:22pm] vanevery: The data can be logged and then a thread can create the movie
[2:23pm] n8fr8: right
[2:23pm] n8fr8: that makes sense... we can log, preview in real time, etc
[2:23pm] vanevery: Right..
[2:23pm] n8fr8: then do the muxing
[2:23pm] vanevery: right
[2:23pm] n8fr8: cool!
[2:23pm] vanevery: Ok..  Wanted a sanity check before I dug in
[2:23pm] vanevery: Thanks 

Benefits: It allows for processing of video as it comes in and supports audio. It is a proper format that is sharable and close to universally playable.

Drawbacks: This technique while initially fast may be slow in the actual processing phase.

##Option 4: Reseach WebM and the like