An Introduction to Media Foundation - Understanding the API Through a COM Lens

· · Media Foundation, COM, C++, Windows Development

When you start working with Media Foundation, it is easy to feel that “I thought I was using Windows’ video and audio APIs, and suddenly everything is about COM.” CoInitializeEx, MFStartup, IMFSourceReader, IMFMediaType, IMFTransform, IMFActivate, HRESULT, GUIDs - they all arrive at once, the atmosphere abruptly turns Win32/COM, and what Media Foundation actually is becomes hard to see.

Rather than covering all of Media Foundation like a dictionary, this article focuses on three things.

  • Why does COM naturally come up when you use Media Foundation?
  • Where exactly does the COM flavor intensify?
  • Where should you start - Source Reader / Sink Writer / Media Session / MFT?

The code examples are C++-based, but the way of thinking is essentially the same when touching it through wrappers from .NET and the like.

Table of Contents

  1. The Conclusion First (In One Line)
  2. The Orientation Tables to Look at First
    • 2.1. What to Touch for What You Want to Do
    • 2.2. Where It Puts On Its COM Face
    • 2.3. Terms to Grasp the Meaning of First
  3. The Big Picture of Media Foundation (Diagram)
  4. Where Media Foundation Puts On Its COM Face
    • 4.1. CoInitializeEx and MFStartup Side by Side at Initialization
    • 4.2. Object Hand-Offs Are Interface-Centric
    • 4.3. Activation Objects Appear
    • 4.4. Settings and Type Information Center on IMFAttributes and GUIDs
    • 4.5. Asynchrony, Callbacks, and Threading Are Also Handled the COM Way
    • 4.6. But Media Foundation ≠ COM
  5. A Rough Guide to Choosing
    • 5.1. Starting With the Source Reader
    • 5.2. Writing to Files With the Sink Writer
    • 5.3. Handling Playback and Synchronization With the Media Session
    • 5.4. Plugging In Custom Components With MFTs
  6. A Practical Checklist
  7. Code Excerpts
    • 7.1. Initialization
    • 7.2. Creating a Source Reader in Synchronous Mode
    • 7.3. Creating a Source Reader in Asynchronous Mode
    • 7.4. Enumerating and Instantiating MFTs With MFTEnumEx
  8. Conclusion
  9. References

1. The Conclusion First (In One Line)

  • Media Foundation is a platform for handling video and audio - the API as a whole is not, in itself, pure COM
  • However, the boundaries between source / transform / sink / activation / attributes / callback are expressed as COM interfaces, so as you use it, IUnknown, HRESULT, GUIDs, and apartment topics naturally come up
  • It is easiest to start with the Source Reader / Sink Writer, move on to the Media Session when you need playback control, and to MFTs when you need custom transforms

In short, Media Foundation is a media-processing platform whose boundary surfaces are deeply infused with COM.

If you internalize that up front, the question “why does it suddenly put on a COM face?” becomes far clearer.

2. The Orientation Tables to Look at First

2.1. What to Touch for What You Want to Do

Looking at this table first makes the entry point easier to choose.

What you want to do Touch first COM intensity Notes
Get frames / samples from a file or camera Source Reader Medium Takes care of decoders for you when needed
Write generated audio / video to a file Sink Writer Medium Can bundle the encoder and media sink for you
Handle playback, stop, seek, A/V sync, quality control Media Session High Requires understanding topologies and sessions
Plug in custom transforms or codec-like components MFT High Think around IMFTransform
Inspect enumerated candidates, then instantiate only what you need IMFActivate High What you get back may be an activation object, not the real thing

2.2. Where It Puts On Its COM Face

Point What appears What to understand first
Initialization CoInitializeEx, MFStartup COM initialization and Media Foundation initialization are separate
Object creation and hand-off IMFSourceReader, IMFMediaType, IMFTransform Most are interface pointers + HRESULT
Settings IMFAttributes, GUIDs Settings and type info expressed as key/value + GUIDs
Enumeration / deferred creation IMFActivate, ActivateObject Enumeration results may not be the real objects
Asynchrony IMFSourceReaderCallback, work queues You must be conscious of callbacks and apartments
Playback control Topologies, Media Session The overall pipeline flow is a Media Foundation-specific concept

2.3. Terms to Grasp the Meaning of First

Term Meaning here
Media Source The entry point that feeds media data into the pipeline: files, networks, capture devices, etc.
MFT Media Foundation Transform. The common model for decoders, encoders, video converters, and so on
Media Sink Where media data goes: screen display, audio output, file output, etc.
Media Session The mechanism that manages the flow of the whole pipeline; handles playback and synchronization
Topology The connection diagram describing how source / transform / sink are wired
Activation Object A helper object for creating the real object later; represented by IMFActivate
Attributes A key/value store keyed by GUIDs; used heavily throughout Media Foundation

Having these as vocabulary up front dramatically reduces the friction of reading the documentation.

3. The Big Picture of Media Foundation (Diagram)

Seen broadly, Media Foundation is a story about a media pipeline. The COM aspect matters, but it is easier to get organized by looking at the big picture first.

Model where the app handles data directlySource Reader (+ decoder)Media SourceAppSink Writer (+ encoder)Media SinkModel using the full pipelineMFTMedia SourceMedia SinkMedia Session

Broadly, Media Foundation has two modes of use.

  • The model using the full pipeline
    • You wire up source / transform / sink, and the Media Session manages data flow and A/V synchronization
  • The model where the app handles data directly
    • You pull data from a source with the Source Reader, and push it into a sink with the Sink Writer

The latter is the easier entry point when you want to process frames and samples yourself. On the other hand, if you want playback and synchronization handled by the platform, the former is the proper route.

The thing to keep in mind is that Media Foundation is, at heart, a media-processing platform - it is a bit different from the feel of directly poking at a pile of COM objects.

But once you start looking at the boundaries between its components, the COM face suddenly intensifies. The next chapter walks through those points in order.

4. Where Media Foundation Puts On Its COM Face

4.1. CoInitializeEx and MFStartup Side by Side at Initialization

This is where most people first feel the strangeness. Before any talk of opening a file or capturing from a camera, CoInitializeEx and MFStartup appear.

  • CoInitializeEx initializes the COM library
  • MFStartup initializes the Media Foundation platform

In other words, COM initialization alone is not enough - you also need Media Foundation-side initialization. Here it dawns on you: “this is not just a video API; there is a substantial COM-based contract underneath.”

In practice, deciding the following at this point makes everything later easier.

  • Which thread uses Media Foundation?
  • Will that thread be STA or MTA?
  • Who owns the responsibilities of MFStartup / MFShutdown and CoInitializeEx / CoUninitialize?

If you press ahead with this design left vague, things get confusing later around callbacks and UI integration.

4.2. Object Hand-Offs Are Interface-Centric

As you read the Media Foundation APIs, most return values and out parameters are COM interfaces.

  • IMFSourceReader
  • IMFMediaType
  • IMFTransform
  • IMFActivate
  • IMFSample
  • IMFMediaBuffer

What is distinctive is that not just the data itself, but type information and settings objects are also expressed as interfaces.

For example:

  • IMFTransform is the interface representing an MFT
  • IMFAttributes is a key/value store
  • IMFMediaType inherits IMFAttributes and is “a description of a media format”

Even something as settings-data-like as a media type is held as a COM interface. This is where the context of IUnknown, QueryInterface, AddRef / Release, and HRESULT flows in naturally.

IUnknownIMFAttributesIMFMediaTypeIMFActivateIMFSourceReaderIMFTransform

By this point you can see: “Media Foundation is a media API, but the way it expresses its boundaries is very much COM.”

4.3. Activation Objects Appear

The COM-ness of Media Foundation shows itself most strongly in activation objects.

IMFActivate is a helper object for creating the real object later. Intuitively, it is easiest to see it as something close to COM’s class factory.

When this appears, an enumeration API’s return value may not be “an immediately usable object” but, first, an array of IMFActivate*. You then instantiate only what you need with ActivateObject.

IMFTransform / sink, etc.IMFActivateEnumeration APIAppIMFTransform / sink, etc.IMFActivateEnumeration APIAppCall the enumerationArray of IMFActivate*Inspect the attributesActivateObject(...)The real COM object

This shape fits well with Media Foundation’s design of discovering swappable components later and composing them.

Also, since an activation object can itself carry attributes, the flow tends to be: “inspect the candidates’ attributes first,” “configure if needed,” “instantiate later.” This too is very COM-like.

4.4. Settings and Type Information Center on IMFAttributes and GUIDs

Working with Media Foundation, there is a point where the settings suddenly look like nothing but GUIDs. At the center of that is IMFAttributes, a key/value store keyed by GUIDs. It is used extremely heavily throughout Media Foundation.

Especially important is IMFMediaType, which inherits IMFAttributes and carries media-format information as attributes.

For example:

  • Major type (audio or video)
  • Subtype (H.264, AAC, RGB32, PCM, etc.)
  • Frame size
  • Frame rate
  • Sample rate
  • Channel count
IMFMediaTypeMF_MT_MAJOR_TYPEMF_MT_SUBTYPESize / FPS / sample rate, etc.

It is easy to feel this as “a forest of GUIDs,” but what is actually happening is quite plain.

  • Hold settings in an attribute store
  • Express media types as attribute stores too
  • Negotiate formats between source / transform / sink by inspecting those attributes

It is simply that COM-style interfaces and GUIDs are used to express settings and type information.

4.5. Asynchrony, Callbacks, and Threading Are Also Handled the COM Way

What is easy to miss in real-world Media Foundation work is asynchronous processing and the threading model.

For example, the Source Reader defaults to synchronous mode. In synchronous mode, ReadSample blocks. Depending on the state of the file, the network, or the device, that wait can become visibly long.

To use asynchronous mode, you pass a callback when creating the Source Reader. You prepare an object implementing IMFSourceReaderCallback, set it on the MF_SOURCE_READER_ASYNC_CALLBACK attribute, and then create the reader.

Slightly more important still is the apartment question. Media Foundation’s asynchronous processing uses work queues, and work queue threads are MTA. Therefore, the implementation becomes simpler if the application side also works in the MTA.

IMFSourceReaderCallbackMF work queue (MTA)Source ReaderApp threadIMFSourceReaderCallbackMF work queue (MTA)Source ReaderApp threadReadSample(...)Returns immediatelyProcesses internallyOnReadSample(...)

Around callbacks, these are the things to watch.

  • Do not touch the UI thread’s STA objects directly from the callback side
  • Make the callback implementation thread-safe
  • If UI updates are needed, marshal only the results back to the UI thread
  • Fix in your mind, from the start, “which thread do Media Foundation callbacks arrive on?”

Media Foundation does not absorb the circumstances of STA objects for you. So it is easier to keep things organized by leaning the Media Foundation worker toward the MTA, and building an explicit bridge to the UI.

4.6. But Media Foundation ≠ COM

Having read this far, it is easy to think “so Media Foundation is just COM after all.” But that is not quite right.

Media Foundation has platform-specific concepts that COM generalities do not cover.

  • MFStartup / MFShutdown
  • Media Session
  • Topologies
  • The topology loader
  • The presentation clock
  • Source Reader / Sink Writer

These are Media Foundation’s own role: how to run the media pipeline.

For example, in the Media Session, the application hands over a partial topology, and the topology loader resolves it into a full topology by filling in the needed transforms. That is not generic COM - it is functionality Media Foundation has as a media-processing platform.

Partial TopologySource -> OutputTopology LoaderFull TopologySource -> Decoder MFT -> Output

Media Foundation uses COM to express the contracts between its components, and on top of that operates as a media-processing platform. Holding this two-layer view keeps you from getting lost.

5. A Rough Guide to Choosing

When deciding your first entry point, this diagram is usually enough.

Read frames / samplesWrite out to a fileNeed playback control or A/V syncPlug in a custom transformWhat you want to doWhat do you need first?Source ReaderSink WriterMedia SessionMFT

5.1. Starting With the Source Reader

The Source Reader is a very approachable entry point when you want to pull data from files or devices.

It suits cases like these.

  • Getting frames from a video file
  • Decoding an audio file to get samples
  • Getting frames from a camera
  • Connecting a Media Foundation source to your own processing pipeline

The Source Reader loads decoders as needed and hands data to the application. On the other hand, it does not take care of presentation clock management, A/V synchronization, or rendering to the screen itself.

It becomes clear when you think of it as an entry point for “getting data,” not for “playing back.”

5.2. Writing to Files With the Sink Writer

The Sink Writer is the entry point when you want to write audio or video to files.

Typical uses include:

  • Saving generated frames into a video file
  • Encoding and writing out audio samples
  • Converting read data to another format and saving it

The Sink Writer finds and loads encoders as needed and manages the data flow into the media sink. It is often combined with the Source Reader, but the two are independent components - they do not have to be used as a pair.

5.3. Handling Playback and Synchronization With the Media Session

If your goal is not “pull data from a file” but proper playback, it is more natural to think around the Media Session.

The Media Session’s turn comes when you have requirements like these.

  • Handling play / stop / seek
  • Leaving audio-video synchronization to the platform
  • Managing the pipeline including quality control and format changes
  • Composing the source / transform / sink flow using topologies

Entering this layer brings you closer to “Media Foundation proper” than the Source Reader / Sink Writer. Accordingly, Media Foundation-specific concepts - topologies, session events - also multiply.

5.4. Plugging In Custom Components With MFTs

The MFT is Media Foundation’s common model for transforms.

You enter this world in situations like:

  • Building your own decoder or encoder
  • Plugging video- or audio-processing components into the pipeline
  • Enumerating codecs and converters and choosing them yourself
  • Controlling things more deeply than the default automatic resolution

In the MFT world, COM-style contracts come strongly to the fore: IMFTransform, IMFActivate, media type negotiation, sample / buffer management. Therefore, rather than diving straight into MFTs as your first entry point, it is clearer to first determine whether what you actually need is the Source Reader / Sink Writer / Media Session.

6. A Practical Checklist

Finally, here is a one-page summary of what to check first in real work.

Item What to check What tends to happen if missed
Initialization responsibility Decide where CoInitializeEx and MFStartup are called and who owns teardown Missed initialization, confused teardown order
Apartment Decide up front whether the thread touching MF is STA or MTA Confusion around callbacks, collisions with the UI
Source Reader mode Decide sync vs async at creation time ReadSample blocks unexpectedly; cannot switch later
Media type negotiation Enumerate output formats and state the one actually used explicitly MF_E_INVALIDMEDIATYPE, formats other than expected
Object lifetimes Make the responsibilities for Release, Unlock, ShutdownObject explicit Memory leaks, retained buffers, inconsistency at shutdown
Activation objects Distinguish whether enumeration results are real objects or IMFActivate Failing because you assumed you could QueryInterface
Topology Know whether you are handling a partial or full topology Getting stuck assuming “it should connect automatically”
Error checking Check HRESULT, stream flags, and events every time Missing partial failures
UI integration Never touch the UI directly from callbacks; marshal only results to the UI thread Hangs, races, hard-to-diagnose bugs

The three highest-priority items are:

  1. Do not pick the wrong entry API
    • First determine which of Source Reader / Sink Writer / Media Session you actually need
  2. Decide the apartment up front
    • If the STA UI and Media Foundation’s work queues will mix, decide the bridging approach first
  3. Do not be sloppy with media type negotiation
    • Proceeding on “it’s probably this format” gets very confusing later

7. Code Excerpts

Rather than complete samples, here are just enough excerpts to show where the COM face appears.

7.1. Initialization

template <class T>
void SafeRelease(T** pp)
{
    if (pp != nullptr && *pp != nullptr)
    {
        (*pp)->Release();
        *pp = nullptr;
    }
}

HRESULT InitializeMediaFoundationForCurrentThread()
{
    HRESULT hr = CoInitializeEx(nullptr, COINIT_MULTITHREADED);
    if (FAILED(hr))
    {
        return hr;
    }

    hr = MFStartup(MF_VERSION);
    if (FAILED(hr))
    {
        CoUninitialize();
        return hr;
    }

    return S_OK;
}

void UninitializeMediaFoundationForCurrentThread()
{
    MFShutdown();
    CoUninitialize();
}

Here, CoInitializeEx and MFStartup sit side by side. This is the first point at which the COM atmosphere abruptly thickens while working with Media Foundation.

In implementations, another layer may already be responsible for COM initialization. Even then, it is safer to fix who owns the responsibility up front.

7.2. Creating a Source Reader in Synchronous Mode

HRESULT ReadOneVideoSample(PCWSTR path)
{
    IMFSourceReader* pReader = nullptr;
    IMFMediaType* pType = nullptr;
    IMFSample* pSample = nullptr;

    HRESULT hr = MFCreateSourceReaderFromURL(path, nullptr, &pReader);
    if (FAILED(hr)) goto done;

    hr = MFCreateMediaType(&pType);
    if (FAILED(hr)) goto done;

    hr = pType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
    if (FAILED(hr)) goto done;

    hr = pType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
    if (FAILED(hr)) goto done;

    hr = pReader->SetCurrentMediaType(
        MF_SOURCE_READER_FIRST_VIDEO_STREAM,
        nullptr,
        pType);
    if (FAILED(hr)) goto done;

    DWORD streamFlags = 0;
    LONGLONG timestamp = 0;

    hr = pReader->ReadSample(
        MF_SOURCE_READER_FIRST_VIDEO_STREAM,
        0,
        nullptr,
        &streamFlags,
        &timestamp,
        &pSample);
    if (FAILED(hr)) goto done;

    // Extract the IMFMediaBuffer from pSample and process it

done:
    SafeRelease(&pSample);
    SafeRelease(&pType);
    SafeRelease(&pReader);
    return hr;
}

What becomes visible here:

  • Both the reader and the media type are COM interfaces
  • Settings are GUID-based
  • Return values are HRESULT
  • In synchronous mode, ReadSample blocks

Even “I just want to read one frame” takes on a heavily COM face at Media Foundation’s boundary.

7.3. Creating a Source Reader in Asynchronous Mode

HRESULT CreateSourceReaderAsync(
    PCWSTR path,
    IMFSourceReaderCallback* pCallback,
    IMFSourceReader** ppReader)
{
    IMFAttributes* pAttributes = nullptr;

    HRESULT hr = MFCreateAttributes(&pAttributes, 1);
    if (FAILED(hr))
    {
        return hr;
    }

    hr = pAttributes->SetUnknown(MF_SOURCE_READER_ASYNC_CALLBACK, pCallback);
    if (SUCCEEDED(hr))
    {
        hr = MFCreateSourceReaderFromURL(path, pAttributes, ppReader);
    }

    SafeRelease(&pAttributes);
    return hr;
}

Here, to enable asynchronous mode, the callback is placed into the attributes before the reader is created.

That is:

  • The callback itself is a COM interface
  • The async configuration goes through IMFAttributes
  • The mode is fixed at creation time

In practice, what matters is making the IMFSourceReaderCallback implementation thread-safe and not carrying UI objects directly into it.

7.4. Enumerating and Instantiating MFTs With MFTEnumEx

HRESULT FindH264Decoder(IMFTransform** ppTransform)
{
    *ppTransform = nullptr;

    IMFActivate** ppActivate = nullptr;
    UINT32 count = 0;

    MFT_REGISTER_TYPE_INFO inputType = {};
    inputType.guidMajorType = MFMediaType_Video;
    inputType.guidSubtype = MFVideoFormat_H264;

    HRESULT hr = MFTEnumEx(
        MFT_CATEGORY_VIDEO_DECODER,
        MFT_ENUM_FLAG_SYNCMFT | MFT_ENUM_FLAG_LOCALMFT,
        &inputType,
        nullptr,
        &ppActivate,
        &count);
    if (FAILED(hr))
    {
        return hr;
    }

    if (count == 0)
    {
        CoTaskMemFree(ppActivate);
        return MF_E_TOPO_CODEC_NOT_FOUND;
    }

    hr = ppActivate[0]->ActivateObject(
        __uuidof(IMFTransform),
        reinterpret_cast<void**>(ppTransform));

    for (UINT32 i = 0; i < count; ++i)
    {
        ppActivate[i]->Release();
    }
    CoTaskMemFree(ppActivate);

    return hr;
}

Here, the enumeration result comes back not as IMFTransform* from the start but as IMFActivate**. Only by calling ActivateObject do you finally obtain the real IMFTransform.

This flow captures Media Foundation’s “suddenly putting on its COM face” feeling remarkably well.

8. Conclusion

It is no accident that COM topics multiply when you work with Media Foundation.

  • Media Foundation is a media-processing platform
  • Its boundaries - source / transform / sink / activation / callback - are expressed as COM interfaces
  • As a result, IUnknown, HRESULT, GUIDs, apartments, and callbacks naturally come up
  • However, the heart of Media Foundation is a media pipeline with a Media Session and topologies - it is not a mere rehash of COM

In practice, thinking in this order keeps things well organized.

  1. First, determine which of Source Reader / Sink Writer / Media Session / MFT you actually need
  2. Decide the apartment and callback policy up front
  3. Handle media type negotiation and object lifetimes carefully

You do not need to understand everything from the start. Begin with the view that “Media Foundation is a media-processing platform, with COM deeply embedded in its boundary surfaces,” and both the documentation and the code become much easier to follow.

9. References

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

This article connects naturally to the following service pages.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog