Extracting a Still Image from an MP4 at a Specific Time with Media Foundation
· Go Komura · Media Foundation, C++, Windows Development, WIC
“Grab the single frame at the 12.3-second mark of an MP4” is a perfectly common requirement: thumbnail generation, inspection logs, representative frames from surveillance footage, evidence images for equipment logs, and so on.
In Media Foundation, though, this is just slightly less straightforward than it looks. It seems like calling ReadSample once after SetCurrentPosition should do it, but in practice key frames, timestamps, stride, image orientation, and the fourth byte of RGB32 all get involved. Proceed carelessly and you get the quietly annoying accidents: the time is slightly off, the image is upside down, or the PNG comes out oddly transparent.
For the big picture of Media Foundation itself, our earlier post What Is Media Foundation? - Why the Face of COM and the Windows Media APIs Comes into View is a useful companion. This time we step down one level and focus solely on pulling a single frame out of an MP4.
In this article, we use IMFSourceReader to extract the still image closest to a specified time from an MP4 and save it as a PNG, with all the pitfalls you actually hit in practice along the way. And at the end there is a single self-contained code listing you can paste straight into the .cpp of a Visual Studio C++ console app project. There are no code fragments scattered through the article — take just the one block at the end and it runs.
The code in this article is also published on GitHub as a complete sample (a one-file, self-contained C++ console app).
media-foundation-extract-still-image-from-mp4-at-specific-time - komurasoft-blog-samples (GitHub)
1. The Conclusion First
Summarizing the conclusions up front:
- For pulling one frame from an MP4, the
Source Readeris a more natural entry point than theMedia Sessionfor this job IMFSourceReader::SetCurrentPositiondoes not guarantee an exact seek. It normally lands slightly before the target, biased toward a key frame, so you then need to advance withReadSampleand compare frames on either side of the target timeReadSamplecan succeed and still give youpSample == nullptr. Check theflagsandpSample, not just theHRESULT- Steering the output media type to
MFVideoFormat_RGB32makes saving easy - However, the fourth byte of
RGB32is not necessarily alpha, so writing it straight to PNG can produce a transparent image. It is safest to set it to0xFFbefore saving so the image is opaque - Handle the per-row
strideand top-down / bottom-up orientation carelessly and the image breaks, so the easiest approach is to normalize the extracted sample into a top-down, contiguous BGRA buffer before handing it to PNG
In short, seek -> read once -> save is a bit sloppy; go as far as seek -> compare around the target while watching timestamps -> copy with stride in mind -> save as PNG, and things become quite stable.
2. Assumptions for This Article
Our assumptions this time:
- The input is a local MP4 file
- We want a single still image
- We return “the frame closest to the specified time,” not “exactly at the specified time”
- The implementation uses
IMFSourceReaderin synchronous mode - The output format is PNG via WIC
- No external libraries — Windows standard APIs only
- An ordinary MP4 whose resolution does not change mid-stream
If you need playback, audio sync, a seek bar, or UI integration, other designs apply — but for “I want one frame,” this approach is very easy to follow.
3. The Tables to Look at First
3.1. The processing flow
| Step | API used | Role |
|---|---|---|
| Open the MP4 | MFCreateSourceReaderFromURL |
Create a media source from the file |
| Select video only | SetStreamSelection |
Avoid reading audio |
| Convert to RGB32 | SetCurrentMediaType + MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING |
Get easy-to-save uncompressed frames |
| Move to the target time | SetCurrentPosition |
Seek in 100ns units |
| Read frames | ReadSample |
Fetch decoded samples one at a time |
| Compare before / after | sample timestamp | Decide the single frame closest to the target |
| Save as PNG | WIC | Write out the image file |
3.2. The selection rule for this article
“The still image at a specified time” sounds simple, but video is discrete frames, not a continuum, so it is easier to decide the selection rule up front.
This time the rule is:
- Advance with
ReadSampleafter the seek - Keep the last sample with
timestamp < target - When the first sample with
timestamp >= targetarrives, compare the deltas of the previous and current samples - Adopt whichever is closer to the target
This makes it easy to get the frame closest to the target, rather than “the first frame at or after the target.”
3.3. The shape of the processing
From start to finish, the flow is roughly input.mp4 -> create the Source Reader -> request RGB32 -> seek -> repeat ReadSample -> compare around the target -> repack into top-down BGRA -> save as PNG via WIC.
It looks simple, but seek precision, null samples, stride, and the fourth byte each hide a small trap. Avoid stepping through them, and this is genuinely easy to use.
4. The Pitfalls to Know in Advance
4.1. SetCurrentPosition is not an exact seek
As Microsoft Learn says for IMFSourceReader::SetCurrentPosition, it does not guarantee exact seeking. For video, it normally lands slightly before the requested position, biased toward a key frame. The expectation is that you then advance to the target position with ReadSample.
Which makes this kind of implementation quite shaky:
SetCurrentPosition(target)- One
ReadSample - Save that frame
On videos with long GOPs, this drifts off target perfectly routinely. This is the quiet mud patch.
4.2. ReadSample can succeed with pSample == nullptr
ReadSample can return S_OK while ppSample is NULL. At the end of the stream, the MF_SOURCE_READERF_ENDOFSTREAM flag is returned; for stream gaps, MF_SOURCE_READERF_STREAMTICK, and so on.
Checking only the HRESULT and immediately dereferencing pSample is dangerous. The safe habit is to look at all three: HRESULT, flags, and pSample.
4.3. Handle stride and orientation carelessly and the image breaks
An image buffer is not necessarily packed in a straight line of width * bytesPerPixel. Rows may carry end-of-line padding, and RGB formats may be bottom-up. Microsoft Learn’s Image Stride and Uncompressed Video Buffers state this quite plainly.
The two especially important points:
IMF2DBuffer::Lock2Dreturns the pointer to the start of scan line 0 and the actual stride- For bottom-up images, the stride can be negative
This article adopts the approach of the Microsoft Learn helpers and repacks everything into a top-down, contiguous BGRA buffer before passing it to PNG. Settle this first, and the save side becomes very simple.
4.4. Don’t assume the fourth byte of MFVideoFormat_RGB32 is alpha
Despite the vibe of the name, MFVideoFormat_RGB32 is not “clean RGBA” you can pass straight to PNG. Windows 32-bit RGB has bytes 0, 1, 2 as B, G, R, and byte 3 may be alpha or may be ignored. The key point is that it is not ARGB32.
Assume it is GUID_WICPixelFormat32bppBGRA and save it as is, and you may find zeros in the fourth byte and an oddly transparent image. Our policy here is to fill alpha with 0xFF before saving so the image is fully opaque.
5. The Implementation Flow
5.1. Create the Source Reader in synchronous mode
Since we only need one frame, we use synchronous ReadSample instead of asynchronous callbacks. In synchronous mode, ReadSample blocks until the next sample, but for one-shot still-image extraction the implementation stays very straightforward.
When creating the reader, we do these four things:
MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING = TRUE- Turn all streams off first
- Turn on only
MF_SOURCE_READER_FIRST_VIDEO_STREAM - Set the output type to
MFMediaType_Video/MFVideoFormat_RGB32
After that, the downstream code can be written on the assumption “we receive RGB32 frames.”
5.2. After the seek, close in while watching timestamps
After SetCurrentPosition, we do not save immediately. We read samples with ReadSample, comparing the last frame before the target with the first frame that crosses the target.
This one extra step absorbs most of the seek’s coarseness.
5.3. Normalize the sample into top-down BGRA
We do not write the extracted sample to PNG directly; we first repack it into a top-down BGRA buffer.
- Combine into a single buffer with
ConvertToContiguousBuffer - Get scan line 0 and the actual stride via the
BufferLockhelper - Copy row by row into the top-down buffer
- Set alpha to
0xFF
Now the save side can treat it as “just a 32bpp BGRA image.”
5.4. Leave PNG saving to WIC
Saving uses WIC’s IWICBitmapEncoder / IWICBitmapFrameEncode. Media Foundation fetches the frame; WIC turns it into an image. The whole thing stays within Windows standard APIs.
6. A Practical Checklist
| Item | What to check | What tends to happen if missed |
|---|---|---|
| Seek accuracy | Don’t decide on the single read right after SetCurrentPosition |
You save a frame well before the requested time |
| Null samples | Check HRESULT, flags, and pSample — all of them |
Null dereference at end of stream or on a stream tick |
| Stride | Absorb the actual stride and the vertical orientation | The image breaks or comes out upside down |
| The 4th byte of RGB32 | Set alpha to 0xFF |
A transparent PNG |
| Time range | Keep 0 <= target < duration |
Unintended behavior near the end of the stream |
| Repeated extraction | Repeat seeks instead of recreating the reader | Needlessly slow |
| Copy count | For bulk processing, mind the cost of ConvertToContiguousBuffer |
Wasted CPU and memory bandwidth |
| Format changes | Handle videos whose resolution changes mid-stream with a separate design | Width/height assumptions break |
7. Build and Run Notes
The code at the end of this article is shaped to be easy to add as a single .cpp to a Visual Studio C++ console app.
A few things worth knowing:
#pragma comment(lib, ...)directives are included, so additional linker settings are generally unnecessarywmainis used, so command-line arguments are handled in Unicode throughout- For default Console App templates with
pch.horstdafx.h, the top of the code uses__has_includeto pick them up so pasting still works - If your project still forces its own precompiled header, set “Not Using Precompiled Headers” for just this
.cppand it builds - x64 is the recommended configuration
Usage is ExtractFrameFromMp4.exe <input.mp4> <seconds> <output.png>. For example: ExtractFrameFromMp4.exe C:\work\input.mp4 12.345 C:\work\frame.png.
8. Summary
When extracting a still image at a specified time from an MP4 with Media Foundation, looking only at SetCurrentPosition and ReadSample is not quite enough. In reality:
- The seek is not exact
- Frames should be compared around the target by timestamp
- A successful
ReadSamplemay still carry no sample - Absorb the
strideand image orientation before saving - Don’t assume the fourth byte of
RGB32is alpha
Cover these, and accidents become genuinely rare.
As a minimal setup focused on pulling one frame properly, this sample should be very usable. It carries over directly to thumbnail generation, saving representative frames from surveillance footage, and emitting evidence images for inspection logs.
9. References
- Complete sample code for this article: media-foundation-extract-still-image-from-mp4-at-specific-time - komurasoft-blog-samples (GitHub)
- Microsoft Learn: Using the Source Reader to Process Media Data
- Microsoft Learn:
IMFSourceReader::SetCurrentPosition - Microsoft Learn:
IMFSourceReader::ReadSample - Microsoft Learn:
IMFSourceReader::SetCurrentMediaType - Microsoft Learn:
IMF2DBuffer - Microsoft Learn:
IMF2DBuffer::Lock2D - Microsoft Learn: Uncompressed Video Buffers
- Microsoft Learn: Image Stride
- Microsoft Learn: MF_MT_FRAME_SIZE attribute
- Microsoft Learn: MF_MT_DEFAULT_STRIDE attribute
- Microsoft Learn: Native pixel formats overview (WIC)
- Microsoft Learn: Uncompressed RGB Video Subtypes
10. The Full Code, Ready to Paste into a .cpp
The single block below is meant to be carried straight into a Visual Studio C++ console app project. The command-line arguments are, in order, input.mp4, seconds, and output.png. It is a one-file, self-contained layout, so it pastes into a project easily.
#define NOMINMAX
#if defined(_MSC_VER)
# if __has_include("pch.h")
# include "pch.h"
# elif __has_include("stdafx.h")
# include "stdafx.h"
# endif
#endif
#include <windows.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <mferror.h>
#include <mfobjects.h>
#include <propvarutil.h>
#include <wincodec.h>
#include <cerrno>
#include <cstdio>
#include <cstdlib>
#include <cwchar>
#include <cmath>
#include <cstring>
#include <limits>
#include <vector>
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfreadwrite.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "ole32.lib")
#pragma comment(lib, "propsys.lib")
#pragma comment(lib, "windowscodecs.lib")
template <class T>
void SafeRelease(T** pp)
{
if (pp != nullptr && *pp != nullptr)
{
(*pp)->Release();
*pp = nullptr;
}
}
class MediaFoundationScope
{
public:
MediaFoundationScope() : m_comInitialized(false), m_mfStarted(false)
{
}
HRESULT Initialize()
{
HRESULT hr = CoInitializeEx(nullptr, COINIT_MULTITHREADED);
if (hr == RPC_E_CHANGED_MODE)
{
return hr;
}
if (SUCCEEDED(hr))
{
m_comInitialized = true;
}
hr = MFStartup(MF_VERSION);
if (FAILED(hr))
{
if (m_comInitialized)
{
CoUninitialize();
m_comInitialized = false;
}
return hr;
}
m_mfStarted = true;
return S_OK;
}
~MediaFoundationScope()
{
if (m_mfStarted)
{
MFShutdown();
}
if (m_comInitialized)
{
CoUninitialize();
}
}
private:
bool m_comInitialized;
bool m_mfStarted;
};
HRESULT GetPresentationDuration(IMFSourceReader* pReader, LONGLONG* phnsDuration)
{
if (pReader == nullptr || phnsDuration == nullptr)
{
return E_POINTER;
}
PROPVARIANT var;
PropVariantInit(&var);
HRESULT hr = pReader->GetPresentationAttribute(
MF_SOURCE_READER_MEDIASOURCE,
MF_PD_DURATION,
&var);
if (SUCCEEDED(hr))
{
hr = PropVariantToInt64(var, phnsDuration);
}
PropVariantClear(&var);
return hr;
}
HRESULT GetDefaultStride(IMFMediaType* pType, LONG* plStride)
{
if (pType == nullptr || plStride == nullptr)
{
return E_POINTER;
}
LONG lStride = 0;
HRESULT hr = pType->GetUINT32(
MF_MT_DEFAULT_STRIDE,
reinterpret_cast<UINT32*>(&lStride));
if (FAILED(hr))
{
GUID subtype = GUID_NULL;
UINT32 width = 0;
UINT32 height = 0;
hr = pType->GetGUID(MF_MT_SUBTYPE, &subtype);
if (FAILED(hr))
{
return hr;
}
hr = MFGetAttributeSize(pType, MF_MT_FRAME_SIZE, &width, &height);
if (FAILED(hr))
{
return hr;
}
hr = MFGetStrideForBitmapInfoHeader(subtype.Data1, width, &lStride);
if (FAILED(hr))
{
return hr;
}
(void)pType->SetUINT32(MF_MT_DEFAULT_STRIDE, static_cast<UINT32>(lStride));
}
*plStride = lStride;
return S_OK;
}
class BufferLock
{
public:
explicit BufferLock(IMFMediaBuffer* pBuffer)
: m_pBuffer(pBuffer),
m_p2DBuffer(nullptr),
m_locked(false)
{
if (m_pBuffer != nullptr)
{
m_pBuffer->AddRef();
(void)m_pBuffer->QueryInterface(IID_PPV_ARGS(&m_p2DBuffer));
}
}
~BufferLock()
{
UnlockBuffer();
SafeRelease(&m_p2DBuffer);
SafeRelease(&m_pBuffer);
}
HRESULT LockBuffer(
LONG defaultStride,
DWORD heightInPixels,
BYTE** ppScanLine0,
LONG* plStride)
{
if (ppScanLine0 == nullptr || plStride == nullptr)
{
return E_POINTER;
}
*ppScanLine0 = nullptr;
*plStride = 0;
HRESULT hr = S_OK;
if (m_p2DBuffer != nullptr)
{
hr = m_p2DBuffer->Lock2D(ppScanLine0, plStride);
}
else
{
BYTE* pData = nullptr;
hr = m_pBuffer->Lock(&pData, nullptr, nullptr);
if (SUCCEEDED(hr))
{
*plStride = defaultStride;
if (defaultStride < 0)
{
const size_t strideAbs = static_cast<size_t>(-defaultStride);
*ppScanLine0 = pData + strideAbs * (heightInPixels - 1);
}
else
{
*ppScanLine0 = pData;
}
}
}
m_locked = SUCCEEDED(hr);
return hr;
}
void UnlockBuffer()
{
if (!m_locked)
{
return;
}
if (m_p2DBuffer != nullptr)
{
(void)m_p2DBuffer->Unlock2D();
}
else if (m_pBuffer != nullptr)
{
(void)m_pBuffer->Unlock();
}
m_locked = false;
}
private:
IMFMediaBuffer* m_pBuffer;
IMF2DBuffer* m_p2DBuffer;
bool m_locked;
};
HRESULT CreateConfiguredSourceReader(PCWSTR inputPath, IMFSourceReader** ppReader)
{
if (inputPath == nullptr || ppReader == nullptr)
{
return E_POINTER;
}
*ppReader = nullptr;
IMFAttributes* pAttributes = nullptr;
IMFSourceReader* pReader = nullptr;
IMFMediaType* pRequestedType = nullptr;
HRESULT hr = MFCreateAttributes(&pAttributes, 1);
if (FAILED(hr))
{
goto done;
}
hr = pAttributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE);
if (FAILED(hr))
{
goto done;
}
hr = MFCreateSourceReaderFromURL(inputPath, pAttributes, &pReader);
if (FAILED(hr))
{
goto done;
}
hr = pReader->SetStreamSelection(MF_SOURCE_READER_ALL_STREAMS, FALSE);
if (FAILED(hr))
{
goto done;
}
hr = pReader->SetStreamSelection(MF_SOURCE_READER_FIRST_VIDEO_STREAM, TRUE);
if (FAILED(hr))
{
goto done;
}
hr = MFCreateMediaType(&pRequestedType);
if (FAILED(hr))
{
goto done;
}
hr = pRequestedType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
if (FAILED(hr))
{
goto done;
}
hr = pRequestedType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
if (FAILED(hr))
{
goto done;
}
hr = pReader->SetCurrentMediaType(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
nullptr,
pRequestedType);
if (FAILED(hr))
{
goto done;
}
*ppReader = pReader;
pReader = nullptr;
done:
SafeRelease(&pRequestedType);
SafeRelease(&pReader);
SafeRelease(&pAttributes);
return hr;
}
HRESULT SeekSourceReader(IMFSourceReader* pReader, LONGLONG targetHns)
{
if (pReader == nullptr)
{
return E_POINTER;
}
PROPVARIANT var;
PropVariantInit(&var);
HRESULT hr = InitPropVariantFromInt64(targetHns, &var);
if (SUCCEEDED(hr))
{
hr = pReader->SetCurrentPosition(GUID_NULL, var);
}
PropVariantClear(&var);
return hr;
}
HRESULT ReadNearestVideoSample(
IMFSourceReader* pReader,
LONGLONG targetHns,
IMFSample** ppSample,
LONGLONG* pChosenTimestampHns)
{
if (pReader == nullptr || ppSample == nullptr)
{
return E_POINTER;
}
*ppSample = nullptr;
if (pChosenTimestampHns != nullptr)
{
*pChosenTimestampHns = 0;
}
IMFSample* pBefore = nullptr;
LONGLONG beforeTimestamp = 0;
bool hasBefore = false;
HRESULT hr = S_OK;
for (;;)
{
IMFSample* pCurrent = nullptr;
DWORD flags = 0;
LONGLONG currentTimestamp = 0;
LONGLONG diffBefore = 0;
LONGLONG diffCurrent = 0;
hr = pReader->ReadSample(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
0,
nullptr,
&flags,
¤tTimestamp,
&pCurrent);
if (FAILED(hr))
{
SafeRelease(&pCurrent);
break;
}
if ((flags & MF_SOURCE_READERF_ENDOFSTREAM) != 0)
{
SafeRelease(&pCurrent);
if (hasBefore)
{
*ppSample = pBefore;
pBefore = nullptr;
if (pChosenTimestampHns != nullptr)
{
*pChosenTimestampHns = beforeTimestamp;
}
hr = S_OK;
}
else
{
hr = MF_E_END_OF_STREAM;
}
break;
}
if ((flags & MF_SOURCE_READERF_STREAMTICK) != 0)
{
SafeRelease(&pCurrent);
continue;
}
if (pCurrent == nullptr)
{
continue;
}
if (currentTimestamp < targetHns)
{
SafeRelease(&pBefore);
pBefore = pCurrent;
pCurrent = nullptr;
beforeTimestamp = currentTimestamp;
hasBefore = true;
continue;
}
if (hasBefore)
{
diffBefore = targetHns - beforeTimestamp;
diffCurrent = currentTimestamp - targetHns;
if (diffBefore <= diffCurrent)
{
*ppSample = pBefore;
pBefore = nullptr;
if (pChosenTimestampHns != nullptr)
{
*pChosenTimestampHns = beforeTimestamp;
}
SafeRelease(&pCurrent);
}
else
{
*ppSample = pCurrent;
pCurrent = nullptr;
if (pChosenTimestampHns != nullptr)
{
*pChosenTimestampHns = currentTimestamp;
}
}
}
else
{
*ppSample = pCurrent;
pCurrent = nullptr;
if (pChosenTimestampHns != nullptr)
{
*pChosenTimestampHns = currentTimestamp;
}
}
hr = S_OK;
break;
}
SafeRelease(&pBefore);
return hr;
}
HRESULT CopyContiguousBufferToTopDownBgra(
IMFMediaBuffer* pBuffer,
LONG defaultStride,
UINT32 width,
UINT32 height,
std::vector<BYTE>& pixels,
UINT32* pStride)
{
if (pBuffer == nullptr || pStride == nullptr)
{
return E_POINTER;
}
BufferLock lock(pBuffer);
BYTE* pScanLine0 = nullptr;
LONG actualStride = 0;
HRESULT hr = lock.LockBuffer(defaultStride, height, &pScanLine0, &actualStride);
if (FAILED(hr))
{
return hr;
}
if (width > (std::numeric_limits<UINT32>::max() / 4))
{
return E_INVALIDARG;
}
const UINT32 destStride = width * 4;
const LONG actualStrideAbs = (actualStride < 0) ? -actualStride : actualStride;
if (actualStrideAbs < static_cast<LONG>(destStride))
{
return E_UNEXPECTED;
}
pixels.resize(static_cast<size_t>(destStride) * height);
BYTE* pDestRow = pixels.data();
BYTE* pSrcRow = pScanLine0;
for (UINT32 y = 0; y < height; ++y)
{
std::memcpy(pDestRow, pSrcRow, destStride);
// The 4th byte of MFVideoFormat_RGB32 is not necessarily alpha,
// so force it to opaque before saving as PNG.
for (UINT32 x = 0; x < width; ++x)
{
pDestRow[static_cast<size_t>(x) * 4 + 3] = 0xFF;
}
pDestRow += destStride;
pSrcRow += actualStride;
}
*pStride = destStride;
return S_OK;
}
HRESULT CopySampleToTopDownBgra(
IMFSample* pSample,
IMFMediaType* pCurrentType,
std::vector<BYTE>& pixels,
UINT32* pWidth,
UINT32* pHeight,
UINT32* pStride)
{
if (pSample == nullptr || pCurrentType == nullptr ||
pWidth == nullptr || pHeight == nullptr || pStride == nullptr)
{
return E_POINTER;
}
*pWidth = 0;
*pHeight = 0;
*pStride = 0;
IMFMediaBuffer* pBuffer = nullptr;
GUID subtype = GUID_NULL;
UINT32 width = 0;
UINT32 height = 0;
LONG defaultStride = 0;
HRESULT hr = pCurrentType->GetGUID(MF_MT_SUBTYPE, &subtype);
if (FAILED(hr))
{
goto done;
}
if (!IsEqualGUID(subtype, MFVideoFormat_RGB32))
{
hr = MF_E_INVALIDMEDIATYPE;
goto done;
}
hr = MFGetAttributeSize(pCurrentType, MF_MT_FRAME_SIZE, &width, &height);
if (FAILED(hr))
{
goto done;
}
if (width == 0 || height == 0)
{
hr = E_UNEXPECTED;
goto done;
}
hr = GetDefaultStride(pCurrentType, &defaultStride);
if (FAILED(hr))
{
goto done;
}
hr = pSample->ConvertToContiguousBuffer(&pBuffer);
if (FAILED(hr))
{
goto done;
}
hr = CopyContiguousBufferToTopDownBgra(
pBuffer,
defaultStride,
width,
height,
pixels,
pStride);
if (FAILED(hr))
{
goto done;
}
*pWidth = width;
*pHeight = height;
hr = S_OK;
done:
SafeRelease(&pBuffer);
return hr;
}
HRESULT SaveBgraToPng(
PCWSTR outputPath,
const BYTE* pixels,
UINT32 width,
UINT32 height,
UINT32 stride)
{
if (outputPath == nullptr || pixels == nullptr)
{
return E_POINTER;
}
if (width == 0 || height == 0 || stride < width * 4)
{
return E_INVALIDARG;
}
const size_t bufferSizeSizeT = static_cast<size_t>(stride) * height;
if (bufferSizeSizeT > static_cast<size_t>(std::numeric_limits<UINT>::max()))
{
return E_INVALIDARG;
}
const UINT bufferSize = static_cast<UINT>(bufferSizeSizeT);
IWICImagingFactory* pFactory = nullptr;
IWICStream* pStream = nullptr;
IWICBitmapEncoder* pEncoder = nullptr;
IWICBitmapFrameEncode* pFrame = nullptr;
IPropertyBag2* pProps = nullptr;
WICPixelFormatGUID pixelFormat = GUID_WICPixelFormat32bppBGRA;
HRESULT hr = CoCreateInstance(
CLSID_WICImagingFactory,
nullptr,
CLSCTX_INPROC_SERVER,
IID_PPV_ARGS(&pFactory));
if (FAILED(hr))
{
goto done;
}
hr = pFactory->CreateStream(&pStream);
if (FAILED(hr))
{
goto done;
}
hr = pStream->InitializeFromFilename(outputPath, GENERIC_WRITE);
if (FAILED(hr))
{
goto done;
}
hr = pFactory->CreateEncoder(GUID_ContainerFormatPng, nullptr, &pEncoder);
if (FAILED(hr))
{
goto done;
}
hr = pEncoder->Initialize(pStream, WICBitmapEncoderNoCache);
if (FAILED(hr))
{
goto done;
}
hr = pEncoder->CreateNewFrame(&pFrame, &pProps);
if (FAILED(hr))
{
goto done;
}
hr = pFrame->Initialize(pProps);
if (FAILED(hr))
{
goto done;
}
hr = pFrame->SetSize(width, height);
if (FAILED(hr))
{
goto done;
}
hr = pFrame->SetPixelFormat(&pixelFormat);
if (FAILED(hr))
{
goto done;
}
if (!IsEqualGUID(pixelFormat, GUID_WICPixelFormat32bppBGRA))
{
hr = WINCODEC_ERR_UNSUPPORTEDPIXELFORMAT;
goto done;
}
hr = pFrame->WritePixels(
height,
stride,
bufferSize,
const_cast<BYTE*>(pixels));
if (FAILED(hr))
{
goto done;
}
hr = pFrame->Commit();
if (FAILED(hr))
{
goto done;
}
hr = pEncoder->Commit();
done:
SafeRelease(&pProps);
SafeRelease(&pFrame);
SafeRelease(&pEncoder);
SafeRelease(&pStream);
SafeRelease(&pFactory);
return hr;
}
HRESULT ExtractFrameFromMp4ToPng(
PCWSTR inputPath,
LONGLONG targetHns,
PCWSTR outputPath,
LONGLONG* pActualTimestampHns)
{
if (inputPath == nullptr || outputPath == nullptr)
{
return E_POINTER;
}
if (targetHns < 0)
{
return E_INVALIDARG;
}
MediaFoundationScope mf;
HRESULT hr = mf.Initialize();
if (FAILED(hr))
{
return hr;
}
IMFSourceReader* pReader = nullptr;
IMFMediaType* pCurrentType = nullptr;
IMFSample* pChosenSample = nullptr;
LONGLONG durationHns = 0;
UINT32 width = 0;
UINT32 height = 0;
UINT32 stride = 0;
std::vector<BYTE> pixels;
hr = CreateConfiguredSourceReader(inputPath, &pReader);
if (FAILED(hr))
{
goto done;
}
hr = pReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
&pCurrentType);
if (FAILED(hr))
{
goto done;
}
hr = GetPresentationDuration(pReader, &durationHns);
if (FAILED(hr))
{
goto done;
}
if (targetHns >= durationHns)
{
hr = E_INVALIDARG;
goto done;
}
hr = SeekSourceReader(pReader, targetHns);
if (FAILED(hr))
{
goto done;
}
hr = ReadNearestVideoSample(
pReader,
targetHns,
&pChosenSample,
pActualTimestampHns);
if (FAILED(hr))
{
goto done;
}
hr = CopySampleToTopDownBgra(
pChosenSample,
pCurrentType,
pixels,
&width,
&height,
&stride);
if (FAILED(hr))
{
goto done;
}
hr = SaveBgraToPng(outputPath, pixels.data(), width, height, stride);
done:
SafeRelease(&pChosenSample);
SafeRelease(&pCurrentType);
SafeRelease(&pReader);
return hr;
}
bool TryParseSeconds(PCWSTR text, LONGLONG* phns)
{
if (text == nullptr || phns == nullptr)
{
return false;
}
wchar_t* end = nullptr;
errno = 0;
const double seconds = std::wcstod(text, &end);
if (end == text || *end != L'\0' || errno != 0)
{
return false;
}
if (!std::isfinite(seconds) || seconds < 0.0)
{
return false;
}
const long double hns =
static_cast<long double>(seconds) * 10000000.0L;
if (hns < 0.0L ||
hns > static_cast<long double>(std::numeric_limits<LONGLONG>::max()))
{
return false;
}
*phns = static_cast<LONGLONG>(std::llround(hns));
return true;
}
double HnsToSeconds(LONGLONG hns)
{
return static_cast<double>(hns) / 10000000.0;
}
void PrintUsage()
{
std::fwprintf(stderr, L"Usage:\n");
std::fwprintf(stderr, L" ExtractFrameFromMp4.exe <input.mp4> <seconds> <output.png>\n");
std::fwprintf(stderr, L"\nExample:\n");
std::fwprintf(stderr, L" ExtractFrameFromMp4.exe input.mp4 12.345 output.png\n");
}
int wmain(int argc, wchar_t* argv[])
{
if (argc != 4)
{
PrintUsage();
return 1;
}
LONGLONG targetHns = 0;
if (!TryParseSeconds(argv[2], &targetHns))
{
std::fwprintf(stderr, L"Invalid seconds: %ls\n", argv[2]);
return 1;
}
LONGLONG actualHns = 0;
HRESULT hr = ExtractFrameFromMp4ToPng(
argv[1],
targetHns,
argv[3],
&actualHns);
if (FAILED(hr))
{
std::fwprintf(stderr, L"Failed. HRESULT = 0x%08lX\n", static_cast<unsigned long>(hr));
return 1;
}
std::wprintf(L"Saved: %ls\n", argv[3]);
std::wprintf(L"Requested: %.3f sec\n", HnsToSeconds(targetHns));
std::wprintf(L"Actual: %.3f sec\n", HnsToSeconds(actualHns));
return 0;
}
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
How to Burn Images and Text into MP4 Frames with Media Foundation
How to burn an image and text into every frame of an MP4 with Media Foundation and produce a new MP4, organized around the roles of the S...
How to Convert YUV to RGB with Media Foundation
How to convert YUV frames to RGB with Media Foundation, covering the Source Reader's automatic conversion, manual NV12/YUY2 conversion, s...
An Introduction to Media Foundation - Understanding the API Through a COM Lens
We explain what Media Foundation is, together with the basic vocabulary of Windows media APIs - COM, HRESULT, IMFSourceReader, MFTs - in ...
Shared Memory Pitfalls and Practical Best Practices
The pitfalls of using shared memory in production, and a design approach that lowers the accident rate by covering synchronization, visib...
Calling a C# Native AOT DLL from C/C++
How to publish a C# class library as a native DLL with Native AOT and call UnmanagedCallersOnly entry points from C/C++ — when this setup...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
Extracting still images from video using Media Foundation, the Source Reader, and WIC is a classic Windows application development implementation topic.
Technical Consulting & Design Review
If you want to sort out seek accuracy, buffer formats, stride, and image orientation before implementing, we can start with direction-setting as technical consulting and design review.
Author Profile
Profile page for the article author.
Go Komura
Representative of KomuraSoft LLC
Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.
Public links