MPC-HC and MadVR Setup Guide

Foreword

Current article is a 2nd draft.

Table of Contents

Introduction

MPC-HC (original outdated build || forked & maintained build) is a classic video player with solid functionality. Not the prettiest thing in 2021, but function over form right?

For users that love tweaking, or users with potato PCs, I recommend using mpv instead.

For casual users, I recommend using MPC. It is easy to configure, and really user friendly. Performance may not be as efficient as mpv, but its ease of use and forgoing command line / config files in favor for a proper UI makes it the better choice for casual users.

MPC-HC is also very flexible with external components. One of the reason it has maintained its relevance after 17+ years is relying on external components/filters, allowing to decode modern formats despite its age.

MadVR is a powerful DirectShow renderer to to replace the antiquated enhanced video renderer from the Windows XP days. It is crucial to use MadVR as it not just provides powerful scalers and filtering functionality, but also fixes some errors with the default renderer (e.g. 10-bit video has incorrect pixelated chroma with the default renderer).

FFDShow allows MPC to interface with SVP 4 if you plan to use it. FFDshow also allows running custom scripts and AVISynth, but that’s advanced stuff and even I don’t touch it.

Please consult Google for audio options if you have a surround sound home theater. I only have headphones and stereo studio monitors.

Installation

There are many ways to install MPC-HC, ranging from codec packs to installing each component yourself. While my favorite way to install MPC-HC is via the K-Lite Codec Pack (K-Lite comes with a neat icon pack), I recommend avoiding using codec packs unless you are familiar with each components and know how to configure them properly. They are convenient but are also easy to install duplicate components and break rather easily if you don’t know what you’re doing. However, since I’m a hypocrite I’m using a codec pack anyways.

If you plan to use SVP 4, it actually also gives you the option to install MPC-HC and the required components within the installer. I recommend installing all the components with the SVP 4 installer so you don’t need to setup the FFDShow scripts (you can technically do it manually, but it’s a HUGE pain).

TL;DR: Install components manually if possible. If you’re lazy, just remember not to install duplicate components.

MPC-HC

Skip this section if you plan to use SVP.

MPC-HC officially has ceased development. However, you can find the community maintained releases here. Scroll down and get the x64 installer for the latest release build (avoid the dev build).

An alternative MPC-BE is also available. There are some slight differences, but setup should be similar. Nothing a Google search can’t solve though.

MadVR

Skip this section if you plan to use SVP.

MadVR can be installed by simply extracting the zip file to a safe location and running install.bat with admin rights. MadVR does not copy the files as it installs in whatever directory you have the folder extracted in.

SVP 4

Preface: I highly recommend using SVP 4 with mpv instead to save you lots of trouble.

SVP 4 can be easily installed with the installer. I recommend installing the core SVP programs and mpv. mpv will be a pre-configured VPY-enabled build in the SVP 4 installation directory, I recommend creating a ~\portable_config\ directory inside the mpv folder so you can run a separate mpv.conf. See my mpv setup guide to learn more.

Components to install (marked [x] are required components for the 64-bit MPC-HC to function):

+ [x] SVP Manager (Pro)
  + [x] [DS_64] Core for DirectShow 64-bit
  + [ ] [VPS_64] Core for Vapoursynth 64-bit (install if you also want mpv, recommended)
+ [ ] SVP Extensions
  + [ ] SVPcode (SVP transcoding tool)
  + [ ] SVPtube 2 (youtube-dl tool, recommended)
+ [x] 3rd-party software
  + [x] [DS32/64] AviSynth Filter
  + [x] [DS_64] ffdshow filters 64-bit
+ [x] 3rd-party software (optional)
  + [x] [DS_64] MPC-HC 64-bit 
  + [x] [DS32/64] madVR video renderer
  + [ ] [DS32/64] LAV filter (MPC-HC provides built-in LAV, so no need to install unless you need it for specific purposes)
  + [ ] [VPS_64] mpv video player (install if you want mpv, recommended)

Read the guide on how to set it up.

Unfortunately some components installed are a bit older, you can try updating them by overwriting them and keeping the configs. It should work, although don’t quote me on this.

You can try installing components manually, but only do it if you’re confident you know how to set them up manually (need to manually link FFDShow Avisynth script to the SVP script as seen below). If you installed everything with SVP 4, this is all done automatically.

ffdshow avisynth

K-Lite Codec Pack

Ignore this section if you plan to use SVP.

There are lots of codec packs out there, but K-Lite is my favorite. Again, don’t touch codec packs unless you know what’s going on, or you’re very careful with them. They tend to “manage” things for you and sometimes break things.

I recommend installing the Standard pack, then manually installing MadVR. K-Lite Codec Pack provides some useful tools and a neat icon pack with it. It even registers the app with Windows Graphics settings by default, so if you’re on a laptop you can go to the settings and force it to use your dedicated GPU.

MPC-HC Basic Setup

To open the options, select View -> Options on the Menu bar. This guide will assuming SVP 4 is NOT installed. If it is, there will be some differences (FFDShow stuff and script setup).

Player

Player options should be self explanatory. Take your time to configure it to your liking.

Formats, Keys, Logo, Web Interface

Pretty self explanatory. Formats might be a bit frustrating to register the app as the default app. You will need to click Run as Admin, select video/audio/both, click done and done again, and then you can register MPC as the default app in settings.

Keys are for shortcut keys. Make sure there are no duplicates!

Playback

Self explanatory options.

Output

Once you’ve got MadVR installed, you can switch Direct Show Video (1st option) to MadVR renderer. Leave the rest as defaults.

Output

Shaders

Hover over the shader files to see where they are located. You can download my provided debanding shaders here. Extract the files to where all the other shaders are installed at.

You can create new presets and add shaders to both pre- and post-resize. Only add to pre-resize unless you know what you’re doing. Photo example is me creating a light deband preset and loading it to the pre-resize filter. Remember to click save to save the shader preset after configuring!

You can access preset shaders quickly via right clicking when watching a video. This should also allow you to see how effective the shader is (you can create a “blank” preset with no shaders, and toggle between profiles to see its effect).

Shaders

Fullscreen

This allows the monitor to quickly switch refresh rate in fullscreen mode to match the source. I recommend not touching this, and use MadVR’s function instead if you want to change refresh rate of your monitor.

Internal Filters

The bottom 3 buttons allows access to LAV filter options. Note that these are the built-in LAV filters. If you installed MPC-HC via other means (e.g. codec pack), it’s possible that an external LAV filter has been installed. They can usually be accessed via system icons if they have been installed.

LAV filter config will be discussed later.

Audio Switcher/Renderer

Audio options. Consult other guides if you require advanced audio settings. The only relevant option here is the normalize filter, which imo should be disabled since it doesn’t work very well anyways. For music file normalization, MPC-HC supports replaygain tags, but just use a proper music player like Foobar2K at that point.

External Filters

Skip if you don’t plan to use SVP 4

Read the guide on how to setup FFDShow for SVP. Or just use the MPC-HC installed via SVP4, it’s setup by default.

Subtitles

Sub settings. Self explanatory. If you have a weak system, you can consider lowering subtitle animation frame rate (some fancier OP/ED have trouble rendering on weaker systems).

Tweaks and Misc

Self explanatory. Fast seek (keyframe) allows faster seeking, but since it uses keyframes some videos will seek rather far away from preset time. Disable fast seek if you must seek exact time defined in your jump distances.

Advanced

Don’t touch this unless you have a reason to.

To set max network streaming resolution, set YDLMaxHeight to the vertical resolution (e.g. 1080 for 1080p).

LAV Filters

See Internal Filters on how to access it.

LAV V

Set Hardware Acceleration to DXVA2 copy-back if you plan to use external filters such as SVP 4. copy-back means the decode stream is copied into system memory.

LAV A

Read guides if you need to know more about audio setup. For those like me using stereo, the default option should suffice. You may want to enable DRC so downmixing surround sound streams won’t have abrupt loudness.

LAV S

1f0.de has explanations for advanced options. Explanation of my setup:

jpn:eng loads eng subs for jpn audio as top priority. (I technically don’t need this first line due to my 2nd line, but I’m too lazy to edit it).
*:eng loads eng subs for any audio as 2nd priority.
*:*|d loads anything, preferably anything marked as default as 3rd priority.
; is a separator.

MadVR

Ctrl+j allows you to monitor decode info for dropped frames. Remember to also monitor system usage, and try not to exceed ~70% CPU/GPU usage if possible. Note that AV1 decoding and HDR tone mapping uses a TON of resources.

Go to system tray and double click MadVR icon to open settings.

Devices

In device settings you can access monitor related settings. Adjust settings to your appropriate device (i.e. PC or TV color levels, HDR settings, etc.). You can also access monitor refresh rate settings here, though if you are not using SVP there really isn’t much reason to use this.

Processing

Activate auto deinterlacing if you watch ancient DVD content. Otherwise, turn it off as Anime sometimes triggers false positives.

Artifact removal:

  • If you don’t plan on using my provided deband shader, feel free to use the MadVR one. I prefer the mpv ported one as imo it’s the best written deband shader out there.
  • Reduce compression artifacts can be use if you mainly stream your anime from Crunchyroll or other Web sources. Useful but GPU intensive!
  • Reduce random noise helps remove some dynamic grain and video noise. Useful but GPU intensive!

Image enhancement setting here is global. I prefer not using any filters.

Scaling Algorithms

Chroma Upscaling: Jinc should be sufficient for mid-range GPU. Only use NGU if you got a really high end GPU. Spline is a good quality fallback that uses much less resource usage compared to Jinc. Stick with Bilinear if you have a potato PC. Remember to activate anti-ringing filter when applicable.

Image Downscaling: SSIM for decent systems. Otherwise Lanczos or Spline are good fallbacks. Honestly even Bilinear is fine, as there aren’t many situations where you’ll downscale videos. The only scenario I can think of is stuffing the window in the corner background playback while doing other stuff, in which case quality isn’t important.

Image Upscaling: Most important, spend your GPU resource here first. Bilinear for potato PCs. Bicubic75 for iGPU systems if able. Spline for weak systems. Jinc for mid-end GPUs. NGU for higher-end systems that require resolution doubling (1080p -> 4K). Remember to activate anti-ringing filter when applicable.

Upscale Refinement: Post-upscale filtering. Some people like it, some don’t. Personally am not a fan. For anime, Adaptive Sharpen and Thin edges should be the most relevant settings.

Deep Dive into Scaling Algorithms

Explaining each algorithm in detail is going to take too much time. Instead, let me redirect you to my mpv external-shaders guide for more info. Scroll down to the upscalers demo. I’ll explain the relevant ones:

ewa_lanczos is Jinc-functioned Jinc, a.k.a Jinc. mpv has a sharp and soft tweaked variant but MadVR only has the default one. Jinc is probably as good as it gets without getting into more advanced NN scalers. Recommended to use in mid-ranged GPUs (GTX 1660 or better).

Next in line is spline. MadVR’s spline 3-tap should be spline36, and 4-tap should be spline64. Much lower system resource and close in quality to Jinc with a tad bit more ringing artifacts. Recommended to use in entry-level GPUs (~GTX 1650 tier).

MadVR has the ability to use various Bicubic upscalers. I like to use Bicubic75 for my iGPU laptops, but other sharper variants also exist. Bicubic is 1 step down from spline is suitable for laptops with iGPUs if spline is unable to run smoothly. Some older (e.g. 1st gen 4K laptops) might still struggle with Bicubic75.

Bilinear or DXVA2 can be used on potato PCs.

NGU is a proprietary doubling (2x) scaler. The sharp variant is similar to FSRCNNX. AA variant is similar to NNEDI3. Standard is in the middle and my personal recommendation. I recommend NOT using NGU unless you can run it at high or very high quality without dropping frames.

Resource Prioritization

CPU/GPU resource is limited, and putting them where it impacts the most is the most important. Here is my flowchart:

Img Upscl Chroma Upscl. Img DwnScl. Img/Artifact Enhc.
Potato Bilinear Bilinear Bilinear None
iGPU Laptops Bicubic Bilinear Bilinear None
Modern Laptops ~ GTX 1030+ Tier Spline Bicubic Bilinear None
~ GTX 1650 Jinc Spline Bilinear None
~ RTX 2060 Jinc Jinc Lanczos Lightweight Options
~ RTX 2060 Alternate NGU Jinc SSIM None
Powerful Systems Sky Is The Limit

Rendering

General rendering options. Leave at default for the most part.

Leave Smooth Motion off, if you want interpolation get SVP.

Dither – I prefer Ordered Dithering.

Trade Quality for Performance – Not as a huge impact you might realize. It’s not gonna magically make your potato play 4K videos.

SVP 4

Consult my SVP 4 guide for SVP specific options.

Conclusion

Thanks for reading!

SVP 4 Setup Guide for Smooth “60 FPS” Anime Playback

Table of Contents

Now that I got your attention with the “60 FPS” title please don’t ever setup SVP to 60 FPS unless absolutely necessary (explained later).

This guide will teach you how to properly configure SVP 4 specifically for anime. I included configurations for both people who hate frame-blending interpolation, and those who prefer a smoother video experience.

What is SVP 4?

For those who don’t know what SVP (Smooth Video Project) is, it’s basically a program that interpolates videos to a higher frame rate for smoother playback experience. A license costs $25 which is rather pricey, although they do have a 30 day trial so test it out to see if you think it’s worth it. Linux users rejoice, you can use it for free.

For those who stream anime, SVP also has a SVPTube plugin that enables you to stream directly to your mpv player with SVP enabled.

Demos

(RANT) WHY DaFuQ DOES WORDPRESS STRIP <button> and <iframe> HTML CODE???

What Works Best with SVP

SVP works best for anime with smooth animations spanning every frame. This means that scene panning, 3D CGI camera movement/panning and CGI models work best (e.g. Demon Slayer, FMP – Invisible Victory, etc. ). Unfortunately many hand drawn action scene aren’t true 24 FPS and SVP struggles to interpolate those resulting in artifacts and lots of frame blending. Fortunately, it is possible to “filter” these scene out (adaptive mode) and disable interpolation with SVP which I go through later. A common mistake I often find in SVP guides is that people trying to force interpolation in such scene resulting in an even worse viewing experience imo.

1080p content also works much, much better than 720p. Sometimes I get annoyed that SVP refuses to interpolate what all might think is an “easy” scene, only to realize the 1080p version works fine. Quality also matters, web streams tend to not work as well as bluerays especially at 720p.

Why should I never set SVP to 60 FPS?

The problem is illustrated in the following code block:

• = Monitor Frame Timing
F = Original 24 FPS Video Frame 
- = Frames to be Interpolated

•••••••••••••••••••••••••••••• (60 Hz Monitor)
F--F-F--F-F--F-F--F-F--F-F--F- (24 FPS Video)

•••••••••••••••••••••••••••••• (48 Hz Monitor)
F-F-F-F-F-F-F-F-F-F-F-F-F-F-F- (24 FPS Video)

As 99.9% of anime are 24 FPS, interpolating to 60 FPS results in uneven interpolated frames. This results in an annoying judder during fast panning. While 60 FPS at first glace may seem like a huge improvement, once you try setting you monitor to 48 Hz you’ll never go back to 60 FPS. As for Mac users, Apple locks your screen at 60 Hz (RIP).

Setting up SVP 4, Monitor, and System Recommendations

Read the documentation on how to setup SVP with your specific player. It’s super easy on Windows. Mac/Linux users will need to compile a vapoursynth enabled mpv build (also really simple). I personally recommend mpv (IINA on MAC) for all platforms, and MPC + MadVR on Windows for the lazy. MadVR are for those who don’t want to spend the time tinkering with scripts, configs and shaders. Mpv are for those who want full control over the player (plus it’s also much more efficient which I recommend for weaker systems). Psst. I have a mpv setup guide you can find on the side bar/menu.

Important: Due to anime often encoded in 10-bit, go to application settings > frc > color > allow 10-bit and set it to true. If not you will experience terrible color-banding due to SVP not dithering 10 > 8 bit properly.

For those with 120 Hz monitors as long as you got a good CPU/GPU you’re gucci. For 60 Hz monitor plebs like me, you got 2 options.

  1. Overclock to 72Hz (multiple of 24). Use this UFO tool to check for frame skipping. If successful you’re good. (Those with TVs that have “Smooth motion / sport mode” you might be able to make 120Hz work from PC input, for 4K @120Hz you need HDMI 2.1). Keep in mind for weaker laptops you might be forced to use 48Hz instead as you CPU might not be able to keep up with 72fps.
  2. Setup 48Hz (might need to use Custom Resolution Utility (CRU) or NVidia/AMD’s GPU control panel to create 48 Hz profile).

As for minimum system specs, I do not recommend going any lower than 4/8 U-series (~2.5+ GHz all-core sustained; sufficient for 48fps) CPUs and Intel Iris (1x deband + no upscale/video filter) or MX150 GPU (basic upscale filter + 2x deband). For optimal playback I recommend 6/6+ desktop-class CPU (4.0GHz+ all-core sustained; powerful enough for 120fps) and a GTX 1050+ (can run advanced upscale shader + 2-4x deband + maybe 1 video filter).

Bonus: Mpv users have access to the high quality in-built deband shader. MPC users – here is the ported hlsl shader from mpv if you dislike the MadVR one.

Remember to check for frame drops! ctrl+J for MPC & shift+I on mpv. Try not to exceed 70-80% usage (for buffer + any more usually starts dropping frames) on both CPU and GPU while video playback in fullscreen.

Monitor Refresh-Rate: Extended

Some simple math:

Multiples of 24fps: 24    48    72    96 120 144
Multiples of 48fps:       48          96     144
Multiples of 30fps:    30    60    90    120
Multiples of 60fps:          60          120

120Hz is the sweet spot due to being the smallest common multiple of 24, 30 and 60, the most common video framerates. While interpolating to 120fps do make videos silky smooth, 4 interpolation frames in between true frames do lose some image sharpness. If you dislike this and wish to only interpolate to 48Hz (double of 24fps videos), you should have your monitor at 144Hz, which most high R-R monitors are anyway.

How to Quickly Switch Refresh Rate (for non-MadVR/MPC Users)

Display Changer II (dc2) is a simple utility that allows changing refresh rate with command line. Custom Resolution Utility (CRU) is a utility that allows adding custom monitor profiles to the system. If you’re on desktop you may also add custom resolution in NVidia/AMD’s control panel. After creating your 48Hz profile in CRU / GPU control panel (if it doesn’t exist in the first place), switch to it and run dc2 -create="48.xml". Then switch back to 60Hz and run dc2 -create="60.xml". Next, create a similar bat file as follows:

:: Set the dc2 directory to where yours is.
@echo off
"C:\Tools\dc2\dc2.exe" -configure="C:\Tools\dc2\48.xml" -temporary
echo Changed refresh rate to 48 Hz. To restore 60 Hz please press any key...
pause >nul
"C:\Tools\dc2\dc2.exe" -restore

Lastly, create a shortcut with similar code as follows to the bat file and pin it wherever you want.

cmd /c "C:\Tools\dc2\RRto48.bat"

It is also possible to use AutoHotKey or even PowerShell to create more advanced version of this but a simple bat file should suffice for most people. It also stays open to remind you that you’ve changed your refresh rate until you press an key to revert back.

SVP 4 Options Overview

Finally to the juicy details. This section is done in a summary format. I had demos and full length explanations done but decided against using it as it gets way too long.


  • Function: Relevant/Available Settings (Bold = Recommended Settings)
    • Description

  • Frames Interpolation Mode: 2m, 1.5m, 1m, Uniform, Adaptive
    • Always set to Adaptive for Anime. 2m has least artifacts, uniform is most fluid but most artifacts. Adaptive mode gives each frame a score of “good”, “bad”, or “worst”, and each scenario uses the interpolation mode set in adaptive pattern setting.
  • Adaptive Pattern: Uniform-1-1.5, Uniform-1-2, 1-1.5-2, 1-2-2
    • Frame interpolation mode to do on “good”-“bad”-“worst” scenarios. Set to Uniform-1-2 for frame-blending shaders, or 1-2-2 if you hate artifact/excess frame-blending. Always set to 1-2-2 for non-frame-blending shaders. 1-1.5-2 is an intermediate of both, viable for content with complex texture.
  • SVP Shader: Fastest, Sharp, By Blocks, Simple Lite, Simple, Standard, Complicated
    • Algorithm to use for interpolation. Fastest & Sharp do not frame-blend. Sharp is higher quality. Remember to set adaptive pattern to 1-2-2 since both these algorithms create terrible puddle artifacts on non-optimal scene and should only be used on “good” scenarios. By blocks to Complicated work similar to each other (frame-blending techniques), the quality of the algorithm increases as you go down the list. (Technically they differ in masking techniques but real world results are almost identical, with the only difference being their quirks with artifacts). Frame blending shaders combine both frame blending and non-frame blending methods to interpolate creating a smoother experience at the cost of potentially more artifacts. Set to Fastest/By Blocks depending on preference, and if processing power allows use the higher quality counterpart Sharp/Standard. If you interpolate more than 2x the frame rate, do not use the fast/sharp shader.
  • Artifact Masking: Disabled, Weakest, Weak, Average, Strong, Strongest
    • Stronger masking = more blending into next frame. Depends on personal preference. Keep in mind stronger masking may have a smoother experience in complex-textured scene but may introduce ghosting in fast objects and fast-panning simple scene. If you dislike the ghosting/blending effect typically seen on TV “sport mode” turn artifact masking off. A “weak” artifact masking is sufficient to mitigate ‘tearing’ effects caused by interpolation errors for small static objects / credits text on a panning scene.
  • Motion Vectors Precision: Half, One, Two
    • Precision in interpolating vectors. Set to half if processing power allows. Only set to two if on low-end system.
  • Motion Vectors Grid: Small: 6, 7, 8 Medium: 12, 14, 16 Large: 24, 28, 32
    • Block size for motion vector prediction. Smaller = more CPU usage. Bigger = smoother panning but may overlook details, also abandons interpolation more easily on “bad/worst” scenarios; Smaller = better interpolation for smaller details but potentially more artifacts/halo due to “false positives”. Sane values 16-32, 24 or 28 recommended for 1080p. Remember to multiply value by 2/3 for 720p content for same effect. For really high-end CPUs, another strategy you may want to experiment is to utilize the smallest grid size (6,7,8) to make the artifacts as small as possible, then use artifact masking to cover them up.
  • Decrease Grid Step: Disabled, Local, Global
    • Refines Interpolation further, especially useful on smaller vector grid values. Set to global if processing power allows.
  • Search Radius: Small & Fast, Small, Average, Large
    • Larger = better at finding huge movements at the cost of artifacting (“stretching” effect). Small & Fast recommended. Small-average is viable for frame-blending shaders.
  • Wide Search: Disabled, Average, Strong, Strongest
    • Last-ditch effort at finding motion vector. Disabled recommended for 1-2-2 adaptive mode, disabled or average for Uniform-1-2 mode.
  • Width of Top Coarse Level: Small, Average, Large
    • Just leave this on average. Read wiki for more details. Set to large for weaker systems.

Recommendations

Now onto my recommendations for each type of systems/preference.

Setting A: Toaster Edition

Toaster Edition Recommended Settings (No Frame Blending/Frame Blending)
Pros: Minimal Resource Usage Cons: Easy to spot imperfections (artifacts/puddles)
Frames Interpolation Mode Adaptive
Adaptive Pattern 1-2-2
SVP Shader Fastest
Artifact Masking Disabled/Average
Motion Vectors Precision Two
Motion Vectors Grid 32
Decrease Grid Step Disabled
Search Radius Small & Fast
Wide Search Disabled
Width of Top Coarse Level Average/Large
Rendering Device No Change/CPU/GPU (Leave at no change unless you have issues, then test either CPU or GPU)

Now for those who absolutely hate frame-blending and artifacts, the goal is to let SVP abandon interpolation anything other than “good” scenarios. The downside for this particular setting is that if a panning scene has to much texture/detail, it will fail to interpolate and display the original. Additionally, if SVP incorrectly assigns such “bad” scene as “good”, the tear/puddle artifacts look horrible.

Please not that you absolutely MUST have your monitor refresh rate 2x of the video fps for this specific config or there will be constant interpolation judder/blending in attempt to sync to 60 Hz. Refresh rate more than 2x may also judder due to the fast/sharp algorithm being bad at interpolating more than 1 intermediate frames.

Setting B: I Hate Soap Opera Edition

No Frame Blending, Very minimal artifacts (basically only smooth panning/motion scene.) Recommended Settings
Pros: Moderate Resource Usage, Only interpolates when confident, No frame-blending (sharper image), No change from original if interpolation fails (SVP will abandon interpolation displaying a duplicate) Cons: Complex texture/CGI panning scene will fail to interpolate (due to 1-2-2 adaptive pattern), Occasional smooth to choppy judder due to a complex object entering a panning scene resulting in a sudden “good” -> “bad” switch with SVP abandoning interpolation, Fast-panning scene foreground text and fast small objects may occasionally “tear/puddle”, MUST set monitor refresh rate to multiple of video file.
Frames Interpolation Mode Adaptive
Adaptive Pattern 1-2-2
SVP Shader Sharp
Artifact Masking Disabled / Weak (Turning on masking helps mitigate small warping errors at the cost of ghosting/frame-blending on fast objects)
Motion Vectors Precision Half or One (Setting to One may avoid some small interpolation errors, but Half gives better accuracy)
Motion Vectors Grid 32
Decrease Grid Step Global
Search Radius Small & Fast
Wide Search Disabled
Width of Top Coarse Level Large

For those who don’t mind a bit frame-blending and some occasional artifacts for a “perceived” smoother experience. This is my favorite setting.

Those who like the Standard algorithm but want something less aggressive could use Setting B above with the algorithm replaced and artifact masking disabled.

Keep in mind a “smoother” setting tends to introduce more artifacts, halos/ghosting, more frame-blending, etc.

Setting C: Regular Edition

Minimal Frame-Blending Recommended Settings (Bold = my recommendation)
Pros: Less noticeable “puddle” artifacts, eliminates “tearing” effect, Complex scene slightly smoother from the frame-blending interpolation Cons: Still fails to interpolate the most complex/CGI/texturized fast motion scene (for 1-2-2 adaptive pattern)
Frames Interpolation Mode Adaptive
Adaptive Pattern 1-2-2/1-1.5-2 (1-1.5-2 for slightly more aggressive interpolation)
SVP Shader Standard
Artifact Masking All viable depending on preference, recommend Disabled/Weak
Motion Vectors Precision Half or One (Setting to One may avoid some small interpolation errors, but Half gives better accuracy)
Motion Vectors Grid 24/28
Decrease Grid Step Global
Search Radius Small & Fast
Wide Search Disabled/Average
Width of Top Coarse Level Average

This last set of settings should be the best in terms of “smoothness”. It also allows SVP to more successfully interpolate complex pan/CGI motion camera scene at the cost of more artifacts (mostly in the form of frame-blending).

Setting D: Sport Mode, but Actually Decent Edition

Frame-Blending Bold = my recommendation.
Pros: Complex CGI camera can successfully be interpolated, Frame-blending tricks eyes for smoother experience Cons: Arch enemy of those who hate frame-blending, May experience weird “ghosting” effect on fast moving/small objects
Frames Interpolation Mode Adaptive
Adaptive Pattern Uniform-1-2
SVP Shader Standard
Artifact Masking All viable depending on preference, recommend Average
Motion Vectors Precision Half or One (Prefer half if performance allows)
Motion Vectors Grid 16-28 depending on preference, recommend 24
Decrease Grid Step Global
Search Radius Small & Fast
Wide Search Disabled/Average
Width of Top Coarse Level Average

My personal favorite are settings C and switching artifact masking weak/disabled depending on the anime.

Conclusion

Thanks for reading! This guide took quite a while to complete. Let me know if you have any question!

Bonus Stuff

Here are some good clips to use to configure SVP 4.

Test clip.

  • Cop Craft OP: Lots of colors and action, good for tuning frame-blending/ghosting effect.
  • Cop Craft ED: Good for tuning smooth panning for complex textures w/ overlayed text.
  • Cop Craft 07 Intro: The first scene panning up the building, and the following scene panning across the *cough* room with moving objects may be a challenge for some settings.
  • Danmachi Oratoria OP: Lots of panning scene with detailed texture. Diagonal panning may post a challenge for “1-2-2” adaptive mode as it classifies many of these frames as “bad” scenarios.
  • Danmachi Orion no Ya: Complex lighting/colors panning. The intro scene in the caves, and panning objects, across the sky, and stars are good for testing motion vector grid sizes.
  • Danmachi S1 01: The intro portion is good for configuring motion vector settings.
  • Danmachi S1 OP: General panning/zoom panning/motion panning.
  • Danmachi S2 04: Starting from the familia battle you can tune motion vector settings. The sword fight is good for tuning motion vector options to prevent artifact forming on fast moving thin objects.
  • Danmachi S2 OP: Good for testing rendering options and shaders. The fast camera movement and foreground text credits are great for testing out shader options especially for text tearing.
  • Katanagatari OP1: (Not in test clip) Lots of textured 2D objects moving around, good for testing vector size settings.
  • Maou-sama, Retry! 06: The purple haired girl raises her hand, then proceeds to roll around. The low-budget anime frame rates absolutely destroys SVP. This is a great scene to test out settings for SVP to “give up” interpolation. Relevant options are motion vector grid size (bigger = more easily give up interpolation), range, and adaptive mode. You may also test out frame-blending shaders here as this specific scene is extremely unfriendly to these shaders.
  • NGNL Zero CGI motion: The motion camera CGI through the complex textured valley is very good for testing adaptive patterns and frame-blending thresholds. “1-2-2” adaptive mode will struggle very hard here. Using “Uniform-1-2” with the Standard shader is recommended. This particular scene is good for tuning your options for a very complex/detailed/textured motion for “bad” scenario frames.
  • NGNL Zero Rolling Credits: The end rolling credits, as you guessed, is to tune scrolling text. Adjusting motion vector precision settings here should help.

mpv Configuration Guide for Watching Videos

Forewords

I also have a guide for setting up MPC-HC and MadVR if you’re less into tweaking. Other then being slightly less resource effecient at the lower end, imo MPC paired with MadVR is just way more convinient to use.

Table of Contents

These are my recommended settings. Adjust the settings suitable to your system.

Get mpv at mpv.io or custom mpv build w/ vpy enabled configured by me. Just extract the archive to a desired directory. I recommend not using a directory that requires admin/superuser privileges. For Windows, an install script should be included but is not required, as all it does is register mpv for file association. For Mac, there’s also IINA which has a prettier UI although some advanced options may not work and may require more tweaking for power users.

The mpv.conf file can be found in %APPDATA%\mpv on Windows and ~/.config/mpv/ for Linux/Mac. Alternatively, \portable_config\ folder can be used in the mpv executable directory on Windows to make it portable, and will take highest priority over %APPDATA%\mpv.

Read the documentation for all available commands.

To illustrate how the config file works:

# This is a comment.
#inactive-config
active-config # This is a comment.
active-option-value=111
active-option-value=222 # active-option-value is now set to 222 instead of 111. 

If you are still using notepad to edit files, I recommend using VSCode, Notepad++ or similar software to edit config files to make your life easier.

General mpv Options

Enable this section if you use SVP 4, requires Vapoursynth mpv build.
I also have an SVP 4 guide if you’re interested.

#input-ipc-server=/tmp/mpvsocket # *nix only
input-ipc-server=mpvpipe # Windows only
hwdec=auto-copy
hwdec-codecs=all
hr-seek-framedrop=no
no-resume-playback

Built-in high quality profile. Enables better upscaling algorithm. Recommend GPU minimum of newer Intel Iris iGPU, AMD VEGA 8 iGPU, or NVidia MX150.

profile=gpu-hq

(Optional) Enabled newer Vulkan API. Disable if compatibility/performance issue. May impove performance when running complex shaders.

gpu-api=vulkan

Override default language selection order:

alang = 'jpn,jp,eng,en'
slang = 'eng,en,enUS' # enUS for Crunchyroll.

Some other UI options:

keep-open=always # Prevents autoplay playlists. Set to 'yes' to autoload. Both "always" and "yes" prevents player from auto closing upon playback complete.
reset-on-next-file=pause # Resumes playback when skip to next file

window-scale=1.5 # Set video zoom factor.
autofit-larger=1920x1080 # Set max window size.
autofit-smaller=858x480 # Set min window size.

no-osd-bar # Hide OSD bar when seeking.
osd-duration=500 # Hide OSD text after x ms.
osd-font='Trebuchet MS'
#osd-font-size=24

Configure network/youtube-dl options:

ytdl-format=bestvideo[height<=?1080]+bestaudio/best # Set max streaming quality as 1080p.
# Default demuxer is 150/75 MB, note that this uses RAM so set a reasonable amount.
demuxer-max-bytes=150000000 # 150MB, Max pre-load for network streams (1 MiB = 1048576 Bytes).
demuxer-max-back-bytes=75000000 # 75MB, Max loaded video kept after playback.
force-seekable=yes

Screenshot:

screenshot-format=png
screenshot-high-bit-depth=yes
screenshot-png-compression=7 # Setting too high may lag the PC on weaker systems. Recommend 3 (weak systems) or 7.
screenshot-directory="%USERPROFILE%\Pictures\mpv"

Video Config

Denoise filter. Recommend keeping it off unless watching CRT era stuff.

#denoise=1

Deband filter. Always turn on for anime.

deband=yes # Default values are 1:64:16:48

Deband parameters configuration. For Anime, 2:35:20:5 recommended for general use. Use 3:45:25:15 for older DVD, badly mastered BD or WEB streams. Use 4:60:30:30 for really, really bad streams.

deband-iterations=2 # Range 1-16. Higher = better quality but more GPU usage. >5 is redundant.
deband-threshold=35 # Range 0-4096. Deband strength.
deband-range=20 # Range 1-64. Range of deband. Too high may destroy details.
deband-grain=5 # Range 0-4096. Inject grain to cover up bad banding, higher value needed for poor sources.

Note: For older/weaker iGPUs, instead of increasing deband-iterations you may need to increase deband-threshold instead if you need a stronger effect. I recommend 1:60:25:30 if you absolutely must run 1 iteration (lower quality but much less GPU usage).


Dither:
Set to auto for Anime due to 8 and 10 bit encodes. Set to no if your monitor has built-in dither (just leave it at auto if you aren’t sure).

dither-depth=auto

Audio Config

volume=100 # Set volume to 100% on startup.
volume-max=100 # Set player max vol to 100%.

Subtitle Config

Enable if subs are broken or you need legacy ssa support.

demuxer-mkv-subtitle-preroll=yes
#sub-ass-vsfilter-blur-compat=yes
#sub-fix-timing=yes

Enable to modify PGS subs.

#sub-gauss=0.5 # Blur PGS subs.
#sub-gray=yes # Monochrome subs (makes yellow font grey).

Allow loading external subs that do not match file name perfectly.

sub-auto=fuzzy

Default subtitle font when none are specified.

sub-font='Trebuchet MS'
sub-bold=yes # Set the font to bold.
#sub-font-size=55 # Set default subtitle size if not specified.

Advanced Video Scaling Config

Internal Scalers

Note: Press shift + I in mpv to view frame drops. Then press 2 to view frame times and processing layers. Make sure your config does not drop frames and ideally frame times should be <25ms.

scale is the luma upscale method. Prioritize resource on this over cscale.
dscale luma downscale method.
cscale chroma upscale method. Human eyes aren’t as sensitive to chroma compared to luma.

If you enabled profile=gpu-hq:

#Scaling algorithm for profile=gpu-hq
scale=spline36
dscale=mitchell
cscale=spline36

Default gpu-hq should be sufficient for most people, however, here are some suggestions for fine tuning:

I have a Toaster (crappy PC) edition:

scale=bilinear # Set spline16 if possible.
dscale=mitchell
cscale=bilinear

I have an iGPU laptop edition:

scale=spline36
dscale=mitchell
cscale=mitchell

I have a decent GPU (GTX 1050+) edition:

scale=ewa_lanczossharp
dscale=mitchell
cscale=spline36 # alternatively ewa_lanczossoft depending on preference

External Shaders

For those who have really good GPUs (GTX 1060+, sometimes need even better GPU) and want to run external shaders:

Remember to download the shader files and put them in the mpv config folder! ~~/xxx.glsl refers to xxx.glsl file in the mpv config directory.

Always enable profile=gpu-hq before using shaders for fallback.

Dynamic Scaler: SSSR
SSSR is a dynamic scaler that improves built-in scalers utilizing structure similarity. Upscale result varies widely depending on your scaler. You may freely choose your preferred one. Moderate to highly GPU intensive.

For a sharper image:

profile=gpu-hq
glsl-shader="~~/SSimSuperRes.glsl" # Set B C parameter to Robidoux.
scale=ewa_lanczossharp
dscale=mitchell
cscale=spline64

Alternatively set to haasnsoft for a softer look (much better at artifact/aliasing supression). I found this combo to be very good for anime and performs close to NN based upscalers.

profile=gpu-hq
glsl-shader="~~/SSimSuperRes.glsl" # Set B C parameter to Mitchell.
scale=haasnsoft
dscale=mitchell
cscale=ewa_lanczossoft

If you really want to run SSSR on a lower-end GPU, the full power version of MX150 (1D10 25W) should be able to run it if you use the faster algorithms. Results will still be better than spline36. Use bilinear/spline16/spline36 (whichever your GPU can handle without dropping frames) for a sharper image. For anime, if you want a softer anti-aliasing look like haasnsoft, use bicubic.

glsl-shader="~~/SSimSuperRes.glsl" # Set B C parameter to Robidoux for sharper image, else use Mitchell (slightl sharper) or Catrom (default).
scale=bicubic # or bilinear or spline16/36
dscale=mitchell
cscale=mitchell

You can modify the SSSR shader’s cubic B and C parameters (on line 31 and 79) to your needs. Default is Catrom (0,1/2). Set to Mitchell (1/3,1/3) or even Robidoux (0.3782,0.3109) for a sharper image. See this graph.

2x Pre-Scaler: Neural Network Scalers
Some of these shaders require high-end GPUs to run smoothly. 2x pre-scaler means they scale a fixed 2x and should only be used if you have a 4K monitor (1080p x2 = 4K). For NN upscalers, more neurons = better quality but significantly more computational power.
Always activate profile=gpu-hq first.

profile=gpu-hq

Choose 1 of the 2x pre-scalers:

  • FSRCNNX is very good for general use / real-life content, may amplify artifacts (due to it being true to its source). Requires good source material. MadVR equivalent would be NGU-Sharp. Very GPU intensive.
glsl-shader="~~/FSRCNNX_x2_8-0-4-1.glsl"
  • RAVU Lite is relatively lightweight and decent for anime due to the lite version being sharper. I would use SSSR over this due to SSSR being better imo. Moderately GPU intensive.
glsl-shader="~~/ravu-lite-r4.hook"
  • NNEDI3 is designed for anime use. Result is very close to MadVR’s NGU-AA. Very GPU intensive.
glsl-shader="~~/nnedi3-nns32-win8x6.hook"

Remaining:

dscale=mitchell
cscale=spline64 # or ewa_lanczossoft (or your choice, really)

Dynamic Scaler for Anime: Anime4K
Anime4K is a unique upscaler + shader designed for Anime. It used to be very destructive, sharpens lines for a… unique “visual quality” look. It works extremely well in some anime and eh in others. Definitely isn’t for everyone. Version 0.9 had lots of flaws (mainly texture banding and line bloating). 1.0RC fixes a lot of the flaws (mainly bad banding) at the cost of doubling GPU consumption. 1.0RC2 optimized GPU usage (even more effecient than 0.9, 1080p > 4K should just barely run on a GTX 1030) and also introduced 2 more speed profiles for even weaker systems (supposedly the fastest profile can run on iGPUs at the cost of shader quality).

The latest 1.0RC2 tones down the “filtered look” even more and serves as an upscaler + filter of {line thinning + line smoothening + sharpening} + artifact/texture-denoiser (reduces jpeg-like edges) combo (note: filter only active when upscaling). One down side to Anime4K compared to other upscalers is that due to line thinning, anime with bad aliasing effects looks even worse (stares at badly made isekai anime magic circles).

profile=gpu-hq
glsl-shader="~~/Anime4K_Adaptive_v1.0RC2.glsl"
dscale=mitchell
cscale=mitchell

For those interested in higher quality downscale:
SSIM downscaler is tuned to be used with mitchell. However, you shouldn’t be downscaling in the first place so just having dscale=mitchell fallback should be good enough.

glsl-shader="~~/SSimDownscaler.glsl" 
dscale=mitchell
linear-downscaling=no

For those inteested in higher quality chroma upscale:
Imo cscale=spline36 is already very good. Human eyes aren’t that sensitive to Chroma compared to Luma. I won’t be surprised if you can’t even tell the difference when simply using cscale=mitchell. However, if you just want the best of the best and have the extra processing power, KrigBilateral is a high quality Chroma upscaler.

glsl-shader="~~/KrigBilateral.glsl"

I always advise testing out settings yourself, “quality” is extremely subjective. For example, in anime, destructive filters (denoise/de-ring/sharpen) upscalers may “look better” visually but not necessarily be true to its source (lower SSIM/PSNR).

If we go purely by measurement, haasnsoft should be a terrible upscaler (and it kinda is with normal content). However, when paired properly with SSSR it looks great in anime. Similarly, NNEDI3 looks better with Anime (as it’s trained for such) despite FSRCNNX is the technically better (accurate) upscaler.

Anime “visual quality” greatly favors algorithms that de-aliases, denoises and sharpens lines to be defined. This is why Anime4K is such a controversial upscaler as it takes these concepts to an extreme.


In terms of algorithm quality (Note: better quality usually = more GPU usage):

abc = abc built-in shader
[xyz] = xyz external shader (requires downloading glsl/hook files)

For scale:
[NNEDI3] or [FRCNNX] > [SSimSuperRes + haasnsoft/ewa_lanczossharp] > ewa_lanczossharp > spline36 > spline16 > bilinear
                <---------------------[Anime4K 1.0RC2] (personal preference)--------------------->

For dscale:
[SSimDownscaler] > Mitchell

For cscale:
[KrigBilateral] > spline64 / ewa_lanczossoft / spline36 > mitchell

Upscalers Demo (totally not stolen):
Open in new tab to view full size.
2x upscale from 100×54 to 200×108 image (Source: DanMachi).
NNS = nearest neighbor (also interger upscale in this case, essentially original image). mpv default is bilinear. Anime4K shown is older 1.0RC ver.
upscalers

Personal mpv.conf file

input-ipc-server=mpvpipe
hwdec=auto-copy
hwdec-codecs=all
hr-seek-framedrop=no
no-resume-playback

gpu-api=vulkan

alang = 'jpn,jp,eng,en'
slang = 'eng,en,enUS'

keep-open=always
reset-on-next-file=pause

window-scale=1.5
autofit-larger=1920x1080
autofit-smaller=858x480
no-osd-bar
osd-duration=500
osd-font='Trebuchet MS'

ytdl-format=bestvideo[height<=?1080]+bestaudio/best
demuxer-max-bytes=150000000 # 150 MB
demuxer-max-back-bytes=75000000 # 75 MB
force-seekable=yes

screenshot-format=png
screenshot-high-bit-depth=yes
screenshot-png-compression=3
screenshot-directory="%USERPROFILE%\Pictures\mpv"

deband=yes # Default 1:64:16:48
deband-iterations=2 # Range 1-16
deband-threshold=35 # Range 0-4096
deband-range=20 # Range 1-64
deband-grain=5 # Range 0-4096

dither-depth=auto

volume=100
volume-max=100

demuxer-mkv-subtitle-preroll=yes
sub-auto=fuzzy
sub-font='Trebuchet MS'
sub-bold=yes

profile=gpu-hq
glsl-shader="~~/shaders/SSimSuperRes.glsl"
scale=haasnsoft # or ewa_lanczosharp
dscale=mitchell
cscale=ewa_lanczossoft # or spline64

input.conf Config

You may create an input.conf in the same directory as mpvconf to override default shortcuts. Here is a cheatsheet for the default shortcuts:

shortcuts

Keep in mind depending on your country keyboard, you may need to modify the file accordingly. For example, the key = has been ‘incorrectly’ mapped as + for default shortcuts (e.g. alt + = for zoom is mapped as alt + + which won’t work on the US keyboard). This is because in some countries the = key is the + key, and while on the US keyboard they are the same key the + is actually shift + =.

Basically when mapping, a means pressing a, but A means shift + a.

To fix the incorrect zoom shortcut:

Alt+- add video-zoom -0.1
Alt+= add video-zoom 0.1

To modify skip durations and add custom skip shortcuts:
Note that keyframe seeking is much faster (no render lag) but may not be exactly x seconds.

# Seek 5s exact, do not display OSD.
RIGHT no-osd seek  5 exact
LEFT  no-osd seek  -5 exact
# Seek 5s to the closest keyframe.
Ctrl+RIGHT  seek  5
Ctrl+LEFT   seek -5
# Seek 20s exact.
alt+RIGHT no-osd seek 20 exact
alt+LEFT no-osd seek -20 exact

Modify mouse wheel to control volume:

WHEEL_UP add volume 10 # In volume %.
WHEEL_DOWN add volume -10

Add audio delay hotkey:

ctrl+= add audio-delay 0.1 # In seconds.
ctrl+- add audio-delay -0.1

Add subtitle delay:

. add sub-delay +0.042 # 0.042s is 1 frame for a 24fps video
, add sub-delay -0.042

I came from MPC-HC so I remapped many shortcuts to the same keys.
Add full-screen shortcut like that in MPC-HC:

F11 cycle fullscreen

Add screenshot shortcut like that in MPC-HC:

F5 screenshot video # Video stream screenshot (extract video frame).
shift+F5 screenshot # File stream screenshot (video frame + render subtitles/signs)
ctrl+F5 screenshot window # Window screenshot (screenshot current player frame including OSD, shaders, upscale, etc.)

Cycle subtitle & audio like in MPC-HC:

s cycle sub
S cycle sub down # shift + s cycle backwards
a cycle audio

Cycle ASS subs style override:

u cycle-values sub-ass-override "yes" "force" "strip" "no"
ctrl+, add sub-scale -0.05 # Decrease sub size by 5%
ctrl+. add sub-scale 0.05 # Increase sub size by 5%

Cycle adaptive sharpen shader:
Note: Requires AdapSharp glsl in mpv conf directory. When AdapSharp is active sigmoid-upscaling needs to be no.

g change-list glsl-shaders toggle "~~/adaptive-sharpen.glsl"; cycle-values sigmoid-upscaling "no" "yes"; show-text "glsl-shaders='${glsl-shaders}'\nsigmoid-upscaling=${sigmoid-upscaling}"

Cycle deband on/off and weak/medium/strong:
Note: the ‘weak’ first set of deband values here needs to match the values set in mpv.conf or the value order will be messed up (in this specific case 2:35:20:5).

h cycle-values deband "yes" "no"
H cycle-values deband-iterations "2" "3" "4" ; cycle-values deband-threshold "35" "45" "60" ; cycle-values deband-range "20" "25" "30" ; cycle-values deband-grain "5" "15" "30" ; show-text "Deband: ${deband-iterations}:${deband-threshold}:${deband-range}:${deband-grain}" 1000

Visit the wiki or the default input.conf for more info.

mpv Custom Build

Here is a custom built mpv with Vapoursynth enabled (for Windows). Also includes a custom built FFmpeg in the folder. The folder is portable, and the settings are also in the portable folder. The OSC is also custom, delete osc lua and conf files if you want the default back.

Note: \portable_config\ folder takes priority over %APPDATA%\mpv folder. (i.e. \portable_config\mpv.conf will load instead of ~~\mpv.conf.)

Download the 7z or zip archive.

UPDATE: Newer mpv build 7z archive, includes newest Anime4K stuff. HOWEVER, the newer builds post v29 have an OSC bug where it fails to load my custom OSC half the time. If this annoys you, delete ~~\scripts\osc.lua and \script-opts\osc.conf and just use the default UI. I also included the older mpv.exe renamed as mpv.exe.old if you want the older stable executable back, though the newer version should offer better compatability with newer formats such as AV1 and HDR stuff.

Anime Encoding Guide for x265 (HEVC) & AAC/OPUS (and Why to Never Use FLAC)

Table of Contents

x265 Settings Guide

All my encoding parameters are tuned for Anime at 1080p. TL;DR at the end.

Why x265?

x265 is a library for encoding video into High Efficiency Video Coding (HEVC / H.265) video compression format. While developed as a replacement for H.264, due to early performance and licensing issues, it didn’t gain much traction, especially as a streaming format. While most devices now support HEVC in some form or another, adoption didn’t really kick off. Despite these downfalls, HEVC does compress much better in higher crf (lower bitrate) and has been a good codec to use since 2018. Most modern devices support hardware decoding (iOS, Android, laptops, Macs, etc.), and even at 1080p most older consumer PCs should be able to software decode just fine.

In terms of efficiency as of right now (2019/2020), when properly encoded, HEVC could save ~25% bandwidth minimum compared to AVC (H.264, with its encoder known as x264) from my observations. HEVC is about the same quality as VP9, a codec by Google. HEVC does have a slight advantage in terms of parallel encoding efficiency, though they are both just as slow when encoding compared to x264. VP9 today is mainly used by Google (YouTube) via the webm/DASH format, which Apple refuses to support on iOS devices which is probably why it’s not widespread. Update: Apple now somewhat supports VP9 on some devices for 4K streaming (iOS 14+).

What I’m more excited about is AV1. AV1 is a next gen codec with members from many tech giants (Apple, Google, MS, etc.) to create a royalty free codec. IIRC some of its baseline is based on VP10, which Google scrapped and had its codebase donated to develop AV1. Unfortunately the reference encoder libaom is still in early stages of development and is rather inefficient. SVT-AV1 is Intel and Netflix’s scalable implementation. It performs (quality-wise) not as well as libaom, but quality is still better than x265 (SSIM), and encode times are much more reasonable. For now, they look promising, and I am excited to see results in a few years. HEVC took 3-4 years before it was more widely accepted by Anime encoders, and took 5 years until 2018 before I began experimenting with it. AV1 began major development in mid-2018 so by that logic we got 2-3 more years to go.

What tools do I use?

Handbrake is a great tool for beginners. It allows reading Blu-ray disks (unencrypted), and has a pretty UI to deal with. While it runs on an 8-bit pipeline, this isn’t an issue for Anime. Hopefully 10-bit pipeline will be supported once HDR becomes common. (Update: 1.4.0 is still under development but should be updated to use a proper 10-bit pipeline for HDR.) If you want FDK-AAC, you will need a custom compiled version. Guides on how to do this are easily found.

Other useful stuff for dealing with Anime (research them yourself):

  • MKVToolNix (MKV merge & Extract)
  • FFmpeg
  • jmkvpropedit
  • Notepad++ or similar text editor
  • Sushi
  • Batch scripting (.bat files)

Which x265 encoder? 8-bit, 10-bit or 12-bit?

To keep things short and not get into technical details: use the 10-bit encoder (Main 10 profile).

10-bit produces slightly smaller files while preventing banding from 8-bit compressions (this isn’t a joke, quantization and linear algebra is a mysterious thing).

Why not just use 12-bit then you say? Well, to put simply: moving form 8 to 10 bit increases color gradient available from 256 to 1024 to eliminate banding, but going to 4048 in 12-bit really isn’t noticeable. In fact, it is much less supported, and from my tests back in 2018 the 12-bit encoder is actually worse than the 10-bit encoder at high crf likely due to less resources put into developing it.

What preset should I use?

First, one must understand x265 is fundamentally different than x264. In x265, the slower the preset, the bigger the file size is at the same crf. While counter intuitive at first, this is due to the more complex algorithms used to more precisely estimate motion and preserve details. To put into perspective how important presets are in x265, a clip encoded at crf=16 preset=medium is actually worse in quality than crf=18 with preset=slow, while also being 50% larger in file size. One may argue SSIM isn’t the best representation as a quality metric, but the matter of fact is that it’s an objective measurement readily available. Subjective tests I’ve done (using Kimi no Na Wa & NGNL Zero) also agree with the above statement. The move from fast to medium to slow each reduces bad noise and artifacts, though I do have to admit going any slower I could not observe any significant improvements (x265 3.1 has a revised veryslow preset that changes this).

Now to answer the questions which preset to use. In my mind, there are only 3 presets worth using: fast, slow, and veryslow.

fast preset, which the quality is pretty “eh”, is the first “sweet spot” from the list and does serve a purpose for people with really weak systems. It is barely any slower than the others while offering slightly better quality. However, it is rather lackluster in my opinion and should never used to encode anime, especially those with dark/complex scene.

slow is the second sweet spot as you could see from the graph below, and should be the preset most people should use. Yes, I know it isn’t the fastest at encoding videos (typically ~10fps for a modern 6-core desktop processor @1080p), but it does offer superior quality especially with fast moving objects in dark scene.

Normally I wouldn’t recommend veryslow. It does, however, have its place, especially with the recent changes in v3.1. The veryslow preset is very useful at higher crf values (22+), with much better motion estimation at low bitrates. The only downside to veryslow is its encoding speed, which is a gazillion times slower than slow and requires a supercomputer. At lower crf values you barely get any improvements and thus could be ignored.

Presets vs. SSIM at 4K (source):

SSIM

Include 1% low & Encode time:
presets

Presets vs. file bitrate at 4K
preset size

Bonus conversation: Due to x264 being much less complex, presets can pretty much maintain only a slight loss in quality as you encode faster. This gives an illusion that slower presets are slower due to spending more time compressing. The more correct way to see this is that slower presets are slower due to doing more motion calculations and finding the best scheme that best describes the frame, which in x264 just so happens benefits compression too. However, x265 is waaay more complex with motion algorithms, meaning that accurately describing motion actually increases bitrate.

x265 Encoding Efficiency

For mainstream systems, just let x265 handle it automatically. For more advanced encoders that may have beefier CPUs, or even servers, this section may interest you.

x265 heavily favors real cores over threads, so keep that in mind if you use programs like process lasso. Theoretically, a 1080p encode with a default CTU size of 64 has an encode parallelization cap of 1080/64 = 16.875 threads. Beyond that x265 will not scale linearly. You could lower CTU size to 32, but you lose some compression efficiency (~1-5% depending on source complexity, personal tests show ~1-2% for anime @crf=19), especially with anime where CTU of 64 actually does benefit.

For those interested, x265 heavily utilizes AVX2 instructions, which runs on 256-bit FPU for optimal speed. Keep this in mind when researching CPU choices.

The more threads you add into a pool, the more encode overhead you will experience, since every row of encode requires the upper right CTU block to complete before it can proceed. When the upper right block is more complex and slows down, the next row has to wait. More rows = more possibility of waiting to happen. This translates to about 30-50% efficiency when encoding with additional threads, with the value gradually lowering the more threads you have. Learn more about frame threading.

According to someone’s 128 core Azure VM test on a 4K footage w/ preset=veryslow, encode scales pretty much linearly to 32 cores, and a bit dodgy at 64 cores. If we translate this to 1080p in theory scaling should be good up to 16 cores, which agrees with our earlier theoretical calculation of 16.8 threads.

Earlier we determined the theoretical limit of 1080p is about 16 threads. While at first glance an 8 core CPU should be the limit, remember that x265 favors real cores over threads. This means on a 16 core CPU, each encoding thread gets a real core to run on, not to mention there are also other processes in the encode chain that could use the extra threads. Due to x265 encode threads being terrible at sharing a core you still get good efficiency. Realistically speaking, if you really want to peg your 16 core CPU at 100%, I would run 2 instances of 16 thread encodes (you gain maybe ~5-10% efficiency), or lower the CTU to 32 to increase the theoretical limit to 33 threads. You can control thread count with the --pools option.

Those with CPUs that have multiple NUMA domains, look at –pools options on how to set x265 to run on a single NUMA node, then run as many instances as NUMA nodes you have with each instance running on each NUMA.

Encoding Parameters

Now that we established we should always use preset=slow, let us look at parameters that you may want to use/override to improve quality. For test clips, I recommend NGNL Zero and Tensei Shitara Slime episode 1 as they represent pretty much the worse case scenario for encoding anime (lots of dark scene, fiery effects, glow effects, floating particles, etc.).

QP, crf and qcomp

Note: x265 also has ABR (average bitrate) and 2-pass ABR encoding mode which I won’t get into. As a quick summary: never use ABR, and only use 2-pass ABR if you absolutely must have a predictable output file size. 2-pass ABR will be identical to crf assuming the result file size are identical and no advanced modifications are made to crf (e.g. qcomp).

Before we begin, we need to understand the basics of how x265 works. QP, a.k.a. quantization parameter controls the quantization of each macroblock in a frame. In QP encoding mode (qp=<0..51>), QP is constant throughout, and each macroblock is quantized (compressed) the set QP target. I do NOT recommend using QP encoding mode, which I will explain why in a bit.

Keyword: Quantization – lossy compression achieved by compressing a range of values to a single value. Higher quantization (QP) = more compression.

crf (crf=<0..51>), known as constant rate factor, encodes the video to a set “visual” quality. Keyword: Visual. The major difference between crf and QP is that QP encoding mode has a CQP (constant quantization parameter) whereas crf uses the QP as a baseline, but varies QP based on perceived quality by human eyes. Essentially crf can more smartly distribute bitrate to where it visually matters as opposed to QP encoding mode where it quantizes (compresses) constantly (constantly in terms of math, not to the eyes). For example, crf will increase QP in motion scene due to motion masking imperfections, while decrease QP in static scene where our eyes are more sensitive. Additionally, in crf encoding mode, QP can be further manipulated with AQ and PSY options (discussed later).

The obvious downside to QP encoding mode, and especially crf is that it is almost impossible to determine the output size, particularly when modifying options that manipulate QP may result in drastic file size differences. However, as opposed to 2-pass ABR, crf guarantees that no matter which episodes you encode, they will all be at the same visual quality across.

One should always encode their own test clips and determine what crf they prefer and can accept in terms of size vs. quality loss. However, as a general guide (personal opinion):

  • I have a small laptop / I watch on TV / 21″ monitor: crf=20-23
  • I have a 24-27″ monitor: crf=18-21 (crf=18 is my lowest recommended value for Anime, note that going below crf=18 may increase file size quite rapidly.)
  • I have a 4K 27″ monitor/TV in my face and I want minimal artifacts: crf=16-18
  • I have a 4K 27″ monitor/TV in my face and I determine video quality by pausing the video and using a magnifying glass (lol): crf=14

Jokes aside, video quality should be assessed by watching, and not by pausing the video. If you can’t see the flaw without pausing the video, is it really a flaw after all? 4K requires higher crf due to upscaling often amplifying artifacts, and high quality upscalers (especially ones like FSRCNN) often benefit from higher quality source.

The variability of crf can be manipulated with qcomp (quantizer curve compression factor; qcomp=<0..1>), but I recommend leaving it alone at default qcomp=0.6. qcomp isn’t as important of a variable as it was in x264 (since x264’s aq-mode=2 & 3 are pretty much broken). In terms of crf encoding mode, high qcomp leads to more aggressive QP reduction (higher bitrate) for complex scene. aq settings also affect qcomp somewhat. crf in x265 is somewhat a pain to tune and confusing to beginners due to tons of settings being intertwined controlling bitrate and quantization.

However, if you are encoding a source where it is mainly either simple scene or complicated motion, you can try increasing qcomp. Somewhere around qcomp=0.8 should be sufficient for even the most extreme cases. One interesting strategy to use for such source is to increase crf by 1 and use qcomp=0.8. This results in similar file size, but complex motion scene essentially gets allocated more bitrate than static portions. Beware of doing this to “average anime” as combining this with other quantization options (AQ, psy) incorrectly may lead to bitrate starving normal scene.

Now before I get to other parameters, it is possible to tune a crf=20 video to be better than one at crf=18. Raising crf isn’t end-all-be-all solution to everything so make sure to read the following sections and do encode tests yourself.

bframes

Allowed values are <0..16>. For anime just use 8. 8 has minor savings (~3-5%) over default 4 with a small encode time penalty (~5%), 16 is pretty much only useful for static images (BD Menu) as the encode penalty isn’t worth the saved space (1.6% smaller than 8).

For a crf=20 episode (~200-400MB/ep), expect about a few MB savings per anime episode going from 4 to 8. While at first glance it isn’t worth it, on the higher end (static-ish low-fps slice of life) you might be able to save 15-30 MB per episode (~5-10% savings on the very extreme end).

If you don’t want to use 8, 6 should be the sweet spot since around 5-6 is where consecutive b-frames drop to single digit percentages for most anime.

You can also encode an episode with bframes=16 and look at the encode log to optimize for your content.

Example encode log with bframes=16, values represent consecutive b-frames percent from 0-16 (notice the sudden drop at 6 b-frames and another drop to decimals at ~8+ b-frames):

x265 [info]: consecutive B-frames: 18.8% 10.2% 19.2% 12.4% 8.3% 15.9% 4.7% 3.1% 2.9% 0.8% 1.2% 1.7% 0.3% 0.2% 0.1% 0.1% 0.2%

bframe=n % file size of bframe=0 (Rokudenashi ep1 crf=19/17):
x265 bframe

ref

According to legend the more the better. Consensus is that optimally you should use ref=6 for 10-bit anime encodes. x265 allows values of <1..16>, although 8 is the the “true maximum” x265 can currently use and any more actually doesn’t improve quality.

A study from late 2018 showed that going from ref=1 > ref=5 > ref=10 > ref=16 improved quality by 0 > 0.04 > 0.13 > 0.13 PSNR @720p with 100% > 135% > 179% > 223% encode time penalty. Note that 10 to 16 changed nothing due to true max capped at 8. For reference, we measure the difference of presets at the 0.x magnitude for PSNR.

Optimally you should use ref=6, although I personally stick with the default preset=slow‘s ref=4. Note that ref=6 is the max you can go if you enabled b-frames and --b-pyramid. Newer versions of x265 also blocks non-conforming values.

Loop Filters: sao, limit-sao, no-sao and Deblock

SAO is the Sample Adaptive Offset loop filter. SAO tends to loose sharpness on tiny details, but improves visual quality by preventing artifacts from forming by smoothening/blending. I would leave this on for crf>=20. crf=18 it depends on personal taste and anime, most of the time I set it to limit-sao. I would not turn off sao with crf any higher than crf=16 unless you are trying to preserve extremely fine grain/detail.

deblock specifies deblocking strength offsets. I tend to leave it at default deblock=0:0 or deblock=1:1. If you want to preserve more grain/detail, you may set it to -1:-1. Please note in FFmpeg based programs you will need to type deblock=0,0 to pass the values, as : is a parameter separator.

Psycho-Visual Options: psy-rd and psy-rdoq

Traditionally, the encoder tends to favor blurred reconstructed blocks over blocks which have wrong motion. The human eye generally prefers the wrong motion over the blur. Psycho-visual options combat this. While technically less “correct”, which is why they are disabled for research purposes, they should be enabled for content intended for “human eyes”.

psy-rd will add an extra cost to reconstructed blocks which do not match the visual energy of the source block. In laymen’s terms, it throws in extra bits to blocks in a frame that are more complex. Higher strength = favor energy over blur & more aggressively ignore rate distortion. Too high will introduce visual artifacts and increase bitrate & quantization drastically.

psy-rdoq will adjust the distortion cost used in rate-distortion optimized quantization (RDO quant). Higher strength will also prevent psy-rd from blurring frames.

If you didn’t understand any of that, don’t worry. Basically, these 2 options are crucial to QP manipulation and grain/detail preservation. psy-rd will decide the tendency to add extra cost (bitrate) to match source visual energy (i.e. grain, etc.) and psy-rdoq will control the extent of this extra cost. Too low and details will be blurred to improve compression (the reason why people hated x265 in the early days), too high and you create artificial noise and artifacts.

For anime, use psy-rd=1. On anime with some grain/snow/particles, or lots of detailed dark scene (often anime movies), set psy-rd=1.5 (e.g. Kimetsu no Yaiba). If grain is a main feature, or the whole series is dark, well mastered with details you may use psy-rd=2 (e.g. NGNL Zero has lots of fallout dust and complex details throughout the whole movie).

psy-rdoq is the key to preserve grain (and also quite aggressively lowers QP). Keep in mind the --tune grain x265 built-in actually has too high of a value for slower presets, as it actually artificially creates even more grain. For anime, I would leave it at default psy-rdoq=1. With some grain/CRT TV effects, I would set it to psy-rdoq=2 or 3 depending on how strong the effects are. For anime where grain effects are staple throughout, or to eliminate blocking in complex fast motion scene at lower crf (<16) you may increase the value to 45. Additionally, some anime use grain to prevent blocking/banding and may also need a higher value to prevent micro-banding. Note that these values apply to preset=slow. A higher value may be needed for faster presets.

Keep in mind both these options drastically increase file size, but also improve visual quality. On lower bitrate encodes, having too high of psy-rd may starve bitrate from flat blocks, and too high psy-rdoq may also create artifacts.

I recommend psy-rd=1 and psy-rdoq=1 for most of the anime out there. I sometimes use psy-rd=1.5 and rarely ever go to 2. I rarely use psy-rdoq >2 due to how much bitrate it increases. (To me, its not worth increasing the value to make that 10 second grainy scene look better. I only increase it when the whole show/movie has grains/grain-like objects).

Note: Lowering pbratio and ipratio may also improve grain retention (more “real” frames over b/p frames), although I do not recommend touching them.

Adaptive Quantization Options: aq-mode and aq-strength

aq-mode sets the Adaptive Quantization operating mode. Raise or lower per-block quantization based on complexity analysis of the source image. The more complex the block, the more quantization is used. This offsets the tendency of the encoder to spend too many bits on complex areas and not enough in flat areas.

As this is beneficial for anime, you pretty much want this enabled. As for the modes:

  • 0: disabled
  • 1: AQ enabled
  • 2: AQ enabled with auto-variance (default)
  • 3: AQ enabled with auto-variance and bias to dark scenes. This is recommended for 8-bit encodes or low-bitrate 10-bit encodes, to prevent color banding/blocking.
  • 4: AQ enabled with auto-variance and edge information.

I highly recommend, in fact I think it pretty much is a must to use aq-mode=3 for anime. It raises bitrate in dark scene to prevent banding. Seasoned encoders will know dark scene with colorful glowing effects (i.e. fire) and dark walls with subtle colors are most prone to banding, blocking, and color artifacts. Setting aq-mode=3 is so beneficial to anime that a crf=20 encode with it looks better than a crf=18 encode without while having similar file size.

aq-strength is the strength of the adaptive quantization offsets. Default is 1 (no offset). Higher = tendency to spend more bits on flatter areas, vice versa. Setting <1 in crf mode decreases overall file bitrate and reduce spending bitrate on plain areas (but potentially introduce blocking/banding in higher crf). For anime, anywhere from aq-strength=0.7 to aq-strength=1 is acceptable depending on the show. I tend to leave it at default unless I feel that aq-mode is spending too much bits. Setting this high sometimes helps with grain preservation, but very expensive bitrate wise and may cause halo artifacts.

aq-motion and hevc-aq are experimental features that are still broken but should be interesting to use in the future (unless AV1 beats it to the punch). From my tests hevc-aq lowers overall VMAF slightly but increases 1% low.

no-strong-intra-smoothing

Prevents bilinear interpolation of 32×32 blocks. Prevents blur but may introduce bad blocking at higher crf. I do not use this option on my encodes. For non-anime stuff, this option may help preserve small details from blurring such as hair. I recommend not using this option unless you have no-sao and low crf as sao has a bigger impact on blur.

no-rect

Disable analysis of rectangular motion partitions. rect is enabled for presets lower than slow. Enabling rect may help improve blocking in challenging scenes. For preset=slow, disabling saves ~25% encode time at the cost of 1-3% compress efficiency.

I recommend not touching rect as this is the main difference between preset medium and slow, unless you really want to save the encode time. Do note that your video quality will decrease ever so slightly.

constrained-intra

I advocate to always encode directly from the blu-ray disk as you avoid re-encoding. Re-encoding (or to be more technical: generation loss) is very destructive for video quality, even more so than re-encoding audio.

However, not everyone could afford blu-rays and rip them manually. If you are forced to re-encode (i.e. you got your video files from cough), ensure you have the highest quality encode possible and enable constrained-intra to prevent propagation of reference errors. For re-encodes I would not go below crf=20 as any lower simply isn’t worth it.

Keep in mind this isn’t a magic parameter to remove artifacting from re-encodes. Edges, especially edges close to each other (e.g. hair) tend to have jpeg-like artifacts in between. Dark scene also suffer (since they’re usually the most artifact prone in an anime encode). Any artifacts from the source will also likely be amplified.

Note: Very rarely, but happened once to me, constrained-intra might cause encoding errors that look similar to missing p-frame data.

frame-threads

Number of concurrently encoded frames. Set frame-threads=1 for theoretical best quality and ever so slightly better compression. If you have <=4 core CPU you may consider this option. High core count systems will suffer greatly (encode speed) if set to 1. I found really no quality loss setting it to >1. As for the “max” to set before losing quality, from my test setting it from 2 to 16 yielded identical results to each other contrary to claims that setting it >3 hurts quality. Basically just let your system handle this value unless you really want to encode with frame-threads=1.

x265’s Biggest Flaw: Grain, Micro-Banding, and Grain-Blocking

As we discussed, x265 has a tendency to blur/smoothen to save bitrate. While this can be mitigated somewhat with psycho-visual options, encoders should be aware of what I call the “Miro-Banding” and “Grain-Blocking” phenomenon. As we know, banding in x265 mostly occurs in dark scene. To combat this, many studios are starting to inject dynamic grain to prevent this in AVC 8-bit BD encodes (Increasingly prevalent post 2018/2019). While extremely effective due to BD’s having very high bitrates, this is actually detrimental to higher crf x265 encodes. On a smoky/fiery grainy scene, x265 tends to smoothen out each block creating weird “patches” of regional grainy block (I call this “grain-blocking”, though do note it isn’t “blocking” per se and more of “regional encode-block” color difference that still has a smooth gradient across). For example, the intro scene in ep1 of Tensei Slime exhibits this problem with smoke and fire effects.

x265 also introduces micro-banding when smoothening out the bright objects in the dark with glow effects, and is even more noticeable (slightly wider) when the source has dynamic grain. These micro-bands aren’t conventional banding, but extremely thin bands that only appear in dark scene objects with strong color gradient change (e.g. edges of fire where color rapidly changes from white to red to yellow glowing then to dark grey in a short distance, or a glowing katana swinging sword effect). Micro-banding are a bigger pain as even stronger deband filters cannot smoothen it out during post-processing (i.e. when watching), and unlike grain-blocking where tuning psy values can easily fix it, micro-banding requires a much lower crf value on top of that to suppress. Luckily, micro-banding is much more rare in encodes and that few seconds in a movie is unlikely to harm viewing experience.

There is no simple answer to fix these 2 problems due to crf targets. Traditionally in x264 such scene will simply end up in blocking artifacts. x265 chooses to eliminate artifacts at the cost of detail loss. The down side is that even at lower crf targets it is tough to eliminate x265’s tendency to blur. To truly eliminate such effects, you will first need no-sao:no-strong-intra-smoothing:deblock=-1,-1 to make x265 behave more like x264, then raise psy-rd and psy-rdoq accordingly (2 and 5 respectively should do the trick). However, this reintroduces unpleasant artifacts x265 aimed to eliminate in the first place, thus I do not recommend such encode parameters unless encoding crf <16 (in which case file sizes are so big just use x264, why even bother x265?).

Image set to greyscale, with color/gamma corrections to amplify the artifacts. Micro-banding is self-explanatory. For grain-blocking, you can see each block still retains the grains, but the average color for each block is different creating a “regional” grain-block effect.
micro-banding

Filtering

Unfortunetly I am not familiar with AVS and VPY scripting and cannot give advises on how to do it. However, x265 benefits greatly from filtering and can avoid its flaws with proper filters, such as proper denoise with masking and custom deband shaders tailored to different encodes. x265 also has a tendancy to magnify aliasing so AA scripts should benefit encodes too.

TL;DR Summary for x265 Encode Settings

Set preset=slow. Then choose 1 following to override the default parameters. These are my recommended settings, feel free to tune them.

  • 1 Setting to rule them all:
    • crf=19-20, limit-sao:bframes=8:psy-rd=1:aq-mode=3
  • Flat, slow anime (slice of life, everything is well lit):
    • crf=19-20, bframes=8:psy-rd=1:aq-mode=3:aq-strength=0.8:deblock=1,1
  • Some dark scene, some battle scene (shonen, historical, etc.):
    • crf=18-19 (motion + fancy FX), limit-sao:bframes=8:psy-rd=1.5:psy-rdoq=2:aq-mode=3
    • crf=20 (non-complex, motion only alternative), bframes=8:psy-rd=1:psy-rdoq=1:aq-mode=3:qcomp=0.8
  • Movie-tier dark scene, complex grain/detail:
    • crf=16-18, no-sao:bframes=8:psy-rd=1.5:psy-rdoq=4:aq-mode=3:ref=6
  • I have infinite storage, a supercomputer, and I want details:
    • preset=veryslow, crf=14, no-sao:no-strong-intra-smoothing:bframes=8:psy-rd=2:psy-rdoq=5:aq-mode=3:deblock=-1,-1:ref=6

Side note: If you want x265 to behave similarly to x264, use these: no-sao:no-strong-intra-smoothing:deblock=-1,-1. Your result video will be very similar to x264, including all its flaws (blocking behavior, etc.).


Audio Codecs Guide

Why you should never use FLAC

Note: If your source isn’t FLAC/WAV/PCM, always passthrough (-c:a copy) to prevent generation loss.

FLAC, to put it simply, is very inefficient use of data. A typical anime (23-24min) episode will have a FLAC audio size of 250MB. Now compare to an AAC track of merely 20-30MB with basically no quality loss. And then there’s the issue of anime audio track being mastered well in the first place…

The preserving audio quality argument has always been the dumbest argument I’ve ever seen. I feel that this is partly due to the “audiophile” community blowing this issue out of proportion. Just yesterday I was on cough and saw a 24 episode cough that is 12GB large. The encoder argues that at this size the video had “barely any quality loss”. It was a re-encode at crf=22.5, with zero encoder tuning and preset at fast. Needless to say, it was blocking and artifacts galore. The icing on the cake? The audio was FLAC “to preserve audio quality”… I’m pretty sure there are more people in this world with 1080p screens than high-end headphones lol. The biggest mistake really wasn’t encoding with crf=22.5. Imo, it was the really imbalanced release. By simply using AAC you can allocate extra 4GB towards video quality with 99.9% people not notice any audio quality loss.

Note: The only exception to using lossless codecs is:

  1. You are in production using lossless codecs to prevent generation loss.
  2. Archival.
  3. Providing remux/RAWs for cough.

Being an audio enthusiast myself, I am a firm believer of double blind ABX testing, and I encourage people to find the lowest bitrate they can go before being able to discern the difference with this method.

I later provide audio samples (see Encoders Comparason section) from various encoders for people to listen. A great ABX tool I found is the ABX plugin for foobar2000 with replaygain enabled.

AAC or OPUS

OPUS

Ideally, one should use opus. It’s a open codec, and outperforms even the best AAC encoders at very low bitrates. CELT/SILK encoding mode switches on-the-fly with voice activity detection (VAD) and speech/music classification using a recurrent neural network (RNN), ensuring the best encoding method depending on content.

It also performs exceptionally well with surround sound. Say we want to target an equivalent quality of 128kbps stereo track (64kbps x 2 channels) on a 5.1 channel setup. Conventional codecs such as AAC simply use 64kbps x 6 channels = 374kbps (usually slightly less due to LFE & C channel being lows/vocals only).

OPUS on the other hand can achieve similar quality with a much lower bitrate, the recommended formula being (# stereo pairs) x (target stereo bitrate). In this case with a conventional 5.1 setup (L & R, C, LFE, BL & BR), we have (2 stereo pairs) x 128kbps = 256kbps. You may need to use the conventional formula if your stereo pairs aren’t “stereo” pairs, i.e. the audio is significantly different.

This is due to OPUS using surround masking and takes advantage of cross-channel masking techniques to smartly distribute bitrate. Think of HE-AACv2 but better and optimized for higher bitrate + surround setup. Instead of distributing 64kbps per channel, more bitrate goes to the stereo pairs and less to the center/LFE channels where only vocals/bass exist. Next, by utilizaing joint encoding (intensity stereo) and other techniques it “increases” the “bitrate” per channel. Obviously this is an oversimplification and the underlying technology is way more complex, but you get the idea.

Sounds cool, right? Why not just always use this sci-fi level magical codec then? Well you see, we already do, but in the form of commercial implementation (FaceTime audio, Discord, Skype, etc.). Opus is not widely supported in many container formats (only in .mkv, .webm, .opus (.ogg), .caf (CoreAudio format)), especially in the video world, essentially limiting users to those using 3rd party players. This leaves us…

AAC

AAC. An old codec developed to kill mp3 and they (mp3) still exist for some reason. AAC is a widely supported codec just like mp3 was as its replacement. If a device can play music, 9/10 it supports AAC. Quality is about the same as Opus on higher bitrates. There’s also HE-AAC that’s used in low-bitrates (~64kbps) with spectral band replication (SBR) and HE-AAC v2 with Parametric Stereo (PS) that’s used in even lower bitrates (~48kbps).

I personally think that you shouldn’t use HE-AAC, especially He-AAC v2 as it’s pretty obvious with a decent studio monitor/headphone. If such low bitrate is needed, Opus is also much better.

Now that we’ve established AAC is the way to go for most users, now to the bad news: unlike Opus with 1 definitive official open source encoder, there are many encoders developed by many corporations for AAC, and some aren’t “free”.

Just like licensing issues plaguing HEVC, the same can be seen in the AAC world. This means without some computer knowledge, it’s pretty hard to your hands on good AAC encoders.

Now to introduce the 4 most prominent AAC encoders:

The first is qaac, an tool utilizing Apple’s CoreAudio toolkit to encode Apple AAC. Mac users: you have direct access to the CoreAudio library and do not need any special tools. Apple AAC is know to be the best consumer available AAC encoder.

Second in place is is Fraunhofer FDK AAC, developed by Fraunhofer IIS, and included as part of Android. While open-source and once used in FFmpeg, due to licensing issues it has pretty much disappeared from any builds. To get it back, you need to compile FFmpeg yourself and enable the non-free flag. Once compiled, the build cannot be shared or distributed.

(There’s also the FhG AAC encoder from Fraunhofer bundled in WinAmp but it’s another complicated topic for another day. Beginners to using AAC, just pretend it doesn’t exist.)

Third place is Nero AAC, once pretty prominent in the AAC world as it was provided by Nero themselves in their software. It is currently outdated and should not be used.

Last is the FFmpeg 3.0 encoder. AAC encoder in FFmpeg used to be trash, even at 128kbps it was hissing all over. The new improved encoder has “eh” quality and can be widely used with 1 caveat: it’s CBR ready only. The VBR is still experimental (although my tests show that it is not any worse than CBR, a.k.a. they’re both “eh”). I do not recommend any less than 256kbps (as you will see later).

Encoders Comparason

My MEGA page contains all the audio file samples (backup link). Feel free to download and listen/ABX them. I chose the Z*lda theme song orchestra due to its challenging nature: brass instruments and cymbals. If anything will show a flaw in low bitrates, its going to be the trumpet. Basically, if a certain encode setting can handle this track, it can handle everything.

Encoder Size (Bytes) Bitrate (kbps) Peak Bitrate (exclude FDK initial) FDK Burst
FLAC 38745489 899 1201
FDK VBR Q1 HEv2 1341425 31 40 41
FDK VBR Q2 HE 2973163 68 101 142
FDK VBR Q2 4051211 92 134 201
FDK VBR Q3 4681758 107 144 208
FDK VBR Q4 5812754 132 179 219
FDK VBR Q5 9925912 226 302 351
OPUS VBR 48kbps 2131368 49 81
OPUS VBR 64kbps 2841338 65 102
OPUS CVBR 64kbps 2792245 64 66
OPUS VBR 96kbps 4278459 97 131
OPUS VBR 128kbps 5698682 131 167
OPUS VBR 192kbps 8409104 194 234
qaac CVBR 40kbps 1794573 41 54
qaac CVBR 64kbps 2847934 65 86
qaac VBR Q27 3081110 70 91
qaac VBR Q64 5760371 132 154
qaac VBR Q91 8882577 205 227
qaac VBR Q109 11767124 272 296
qaac CVBR 256kbps (iTunes) 11852276 274 310
FFMPEG VBR Q0.5 2701485 61 75
FFMPEG VBR Q1.0 4527582 104 133
FFMPEG VBR Q1.5 6049106 139 185
FFMPEG VBR Q2.0 8727091 202 249

Encoders

FDK AAC

FDK AAC is probably the most accessible AAC encoder. It is built into FFmpeg and HandBrake (both require manual non-free compiling, go check out guides on the official site / videohelp / reddit how to compile HandBrake with FDK AAC. For FFmpeg use media-autobuild-suite. It being only ever so slightly behind qaac makes it the perfect encoder for ripping Blu-Rays without demux/remuxing.

Keep in mind anything other than Q5, all will have a low-pass filter, with it progressively lower the lower the bitrate is. I recommend using VBR Q5 due to its pretty high bitrate (~100kbps/channel), and to prevent the low-pass filter.

An interesting thing is that FDK tends to not respect VBR quality goal as well (i.e. VBR Q2 has a theoretical ~64kbps goal, but the result file is 92kbps), and allocate bits when it knows Q2 is simply too low for a specific file. This really isn’t an issue as on average FDK Q2 is indeed ~64kbps, just more of an FYI that FDK WILL throw more bits if a complex track needs it.

One “quirk” of FDK AAC is that it tends to allocate bitrate at the beginning for music (this is not a problem for anime as shown later) when it detects a complex beginning. This is why I do not recommend FDK AAC for music files as it tends to waste bits at the beginning adding up slowly. On the bright side over-allocation doesn’t impact quality (it “increases” actually haha).

FDK Q5

Apple AAC (qaac)

While in a perfect world everyone should use the coreaudio encoder, the matter of fact is that it requires demuxing the file making FDK the best in terms of workflow in FFmpeg based programs. Some tools like Staxrip processes files as such, but for most this is very inconvenient. (Mac users: CoreAudio encoder can directly be used by programs like HandBrake).

Apple AAC is more suitable for music (as expected, since it’s the encoder used for iTunes). When encoding quiet tracks (such as movies) it often undershoots its bitrate target by quite a bit (which can be mitigatged by using a higher TVBR target or simply using CVBR). I really have nothing to say about it other than it’s really good, especially at low bitrates.

Apple AAC tends to respect bitrate targets much more than FDK AAC (with the exception of ultra low bitrates). It also favors allocating bitrate to music over speech.

Constrained VBR (CVBR) mode in Apple AAC constrains the minimum value to not go too low, but does not limit the upper value like that on Opus. Keep in mind CVBR encodes identical to TVBR if bitrate doesn’t drop below threashold. Funny enough, in the context of music tracks, CVBR behaves very similar to FDK AAC with an initial burst.

qaac

Opus

Not much really to say here. You really have to experience it yourself to understand what I mean by Opus really is the next gen codec. It pretty much has no major flaws, and multiple public tests have proven Opus to be pretty much the best encoder out there.

Its 2 self-switching encoding modes SILK/CELT also makes it perfect for both music and speech/vocal.

The only real down side to Opus (other than format support), is that it doesn’t have a quality mode and requires specifying a bitrate. For example, qaac Q91 has a target of ~192kbps. However, on complex tracks, it would not hesitate to go much higher just like the sample track provided. On tracks such as slice of life anime where much is just voice and quietness, the result bitrate will be much lower than 192. On the other hand, Opus pretty much always produces ~192kbps file. Additionally, qaac quality mode will base bitrate on per channel, whereas OPUS bases bitrate per audio track, meaning for OPUS you will have to manually set double/triple the bitrate for 5.1/7.1 surround sound audio encodes.

For streaming companies, Opus’s ability to respect bitrate so well is a huge advantage for networking and storage problems. For consumers like us however, this is a disadvantage as a quality mode would serve us much better. (Think of quality mode like crf in video encoders, huge file-to-file variation but ultimately equivalent quality for each file).

CVBR mode in Opus works different than those in AAC. Think of it as a VBR mode for CBR. It’s basically constant bitrate, but with a bit wiggle room for very small momentary bursts.

Opus also has a “soft” low-pass filter from 16-20kHz, and starts becoming progressively aggressive <96kbps. I say “soft” because it isn’t a hard-limit, but decided by the encoder depending on content. For example, in my test track even as low as 48kbps, when even HE-AAC is low-passing <13kHz OPUS still allows trumpet harmonics up to 20kHz. Opus also momentarily boosts VBR to ~50-55 kbps as the encoder smartly determined that low-passing the trumpet will be detrimental to the overall quality.

Opus vbr 192

FFmpeg 3.0

Holy Jesus it’s bad. Anything lower than 256kbps produces farting noises with the trumpet (both CBR and VBR). Usually it isn’t this bad, but now you know why I chose this track to compare encoders.

On the bright side their VBR algorithm is pretty much spot on. Though according to their website VBR is still experimental and should not be used.

ffmpeg

Encoding Audio for Anime

FDK AAC vs. qaac

FDKvQaac

Using Danmachi episode 1 as an example, you can see how both are really more similar than different. The only real takeaway is that qaac’s quality mode differs drastically with different content and really favors music (Q91 is ~192kbps for music, but in this case it undershot the target by a 30kbps margin). After using a higher quality target for qaac to compensate, you can see both have very similar bitrate allocation and variability, with the exception of the ED where qaac increases the bitrate compared to FDK AAC.

Opus vs. qaac

OpusvQaac

As both AAC and Opus are fundamentally different, we can’t really draw any conclusions from this.

One thing to note is that Opus really does respect bitrate targets VERY well. Oh and again, qaac loves to allocate bits to music over vocals (OP and ED peak).

All numbers in kbps.

Encoder Stereo 5.1 7.1
OPUS 192 384 512
qaac (Apple AAC) -V 109 -V 100 -V 91
FDK AAC -m 5 -m 5 -m 5
FFmpeg 3.0 384 CBR DON’T BOTHER
Notes
OPUS Remember to enable the --vbr switch unless you need CBR. If your audio isn’t a conventional channel layout use the traditional (# channels) x 96kbps formula, else use (# stereo pairs) x 192kbps. Use 64kbps/channel or 128kbps / stereo pair if more compression is desired.
qaac (Apple AAC) Use capital -V for true VBR. -V 109/100/91 is approx 128/112/96kbps per channel. -V 82/73/64 (~80/~72/~64 kbps/channel) can be used if more compression is desired. qaac tends to aggressively save bitrate for non-music content, thus a higher VBR mode is recommended to compensate.
FDK AAC Always use VBR Quality 5 to avoid low-pass filter. Q5 is approx 100kbps per channel. You may use Q4 (~64kbps/channel) if you don’t mind the low-pass filter.
FFMPEG 3.0 I really do not recommend using this audio encoder unless it is stereo and a final compression render to upload to sites such as Youtube.

Here are my general rule of thumb for audio quality vs. gear:

64kbps/channel for budget chi-fi gear (<$100 USD).
96kbps/channel for mid-fi gear (~$300-500 USD audio setups).
SnakeOil/channel for hi-fi gear ($1000USD+ setups). Jokes aside double-blind ABX test your limits, though I doubt anyone can differentiate past 160kbps/channel.

Conclusion

Thanks for reading!

Basically yeah, follow this guide and your encodes should be good. This is the end of the guide, but if you want to read my ranting about audio gear and stuff keep reading.


Bonus: Ranting on Audio Gear and Recommendations

I really do wonder to what extent people are able to ABX and discern the difference between compressed files and lossless. I personally have trouble beyond >96kbps stereo for most sources and beyond >160kbps I literally cannot tell the difference. I only own an HD 650 and Etymotic ER2/3/4XR so I do wonder what people with better gear can hear.

My journey into the “audiophile” world

Skip this section if you aren’t into my life story.

In middle school, just like everyone out there I owned a pair of V-shaped generic $20 IEM. The forgotten days where I though muddy bass=good, the days when a +20db bass boost was about right.

High school is when I had the first taste of “real” audio gear. My first IEMs were the RHA MA750s. Classified as a “warm” IEM with a ~10db sub-bass boost and bright highs, it was one of the best at its time in the $100 price range. Initially I though it was really bass-light (lol), and the sharp 10K made me not like it as much.

In University, following the trend with everyone I got the infamous HD650s, and also ventured into the tube amp world (full of regrets, tube amp = money pit, coloured sound, and eh detail retrieval on most tubes). Here is when I got to experience what ‘soundstage’ is, got more used to a ‘natural’ sound signature.

While searching for an IEM upgrade after my RHA crapped out, I stumbled upon Etymotic’s ER3 series. I always knew that they were the benchmark for studio IEMs, but never really though much of it until one faithful day for some god damn reason I ordered the ER3XR to try it out.

To my pleasant surprise, after spending a month with it and getting used to a neutral sound signature, I am really impressed. Listening to classical on it is nothing compared to other IEMs in terms of timbre and accuracy, albeit the single BA does sound a tad ‘dry’ sometimes.

Sound “Quality”

My biggest complaint/gripe with the audio world is that many tend to associate “sound signature” with “sound quality”. So once again, I would like to scream at the world:

“STRONG BASS ISN’T SOUND QUALITY. IT’S YOUR SOUND SIGNATURE PREFERENCE!”

Fun fact, stronger bass actually worsens sound quality due to the stronger bass often creating distortion worsening harmonic distortion (THD) measurements.

(Below info relevant as of early 2021)

Part 1: IEMs

Beginners looking for recommendations for audio gear (IEMs): I highly recommend Etymotic’s ER2, ER3, and ER4 series, specifically the ER2XR for beginners (Diffuse-field flat with a +5dB bass boost). They definitely aren’t for everyone with their house-sound (Diffuse-Field Target w/ slightly weaker treble). However, for those who want a taste into what a truly neutral IEM sounds like, these are unbeatable.

For those not into Diffuse-Field tuning, I recommend finding headphones tuned to the Harman/modified-Harman 2017/2019 IEM target (more “mainstream”). I haven’t been keeping up with what’s best, but for the lower budget people chi-fi (Chinese-fi) is the way to go. Some recently popular brands in the chi-fi are Tin, KZ, FiiO, BLON, Moondrop, etc. Moondrop is especially hot on the radar these days, and other than it’s tuning, I can also attest to it’s sound quality from display units I tried.

Fun fact: Etymotic invented insert headphones.

There are 2 variants within the ER2/3/4 series, the SE/SR and XR (i.e. ER3XR). SE/SR is the studio version, and the XR is the bass-boosted (+3db, or +5dB for ER2XR) variant. If this is your first time venturing into the neutral sound signature, get the XR variant. Get the SE/SR if you want a truly FLAT bass. Note that since the highs aren’t boosted like most mainstream IEMs, the bass is surprisingly present due to other frequencies not drowning it out.

The ER2 are the cheapest (~$125 USD) of the bunch and use Dynamic drivers (DD). They have almost identical frequency response to the ER3, but due to the dynamic drivers they have better sub-bass (2dB stronger), sound more natural, and bass packs more of a punch (likely from the slower decay of DD). If you listen to mainstream music, these are the ones to pickup. The value these offer are quite amazing at $125. I personally prefer the SE (non-bass-boosted) version.

ER3 use a single Balanced Armature driver (BA) and are suited for people who listen to classical and the like, as it has better detail retrieval at the cost of bass sounding uh… a bit unnatural (difficult to describe, best way to put it is that it lacks impulse and dynamics). The ER3s are the best to get into the Etymotic house sound (~$130-160 USD). These are fine with non-fast bass (bass guitar, non-synthetic bass drums, etc.), but once a track gets too complex (i.e. in EDM or metal when every instrument plays + heavy bass hits) the single BA sometimes gets overwhelmed and bass starts bleeding into the mids. Those who are used to DDs might find BAs sound a bit ‘dry’. I personally recommend the SE version over the XR version for the ER3 due to it sounding more natural and less prone to bass bleeding into mids.

ER4 are the professional version of the ER lineup, and have a legendary history. They are manufactured in the States, with each unit having its FR and channel match certification. Get these if you want the best Etys can offer. They are a small upgrade from the ER3 so I do not recommend the ER4s if you already own the ER3s. For the ER4 I recommend the XR over the SR variant which is opposite of the ER3 recommendations (reasons are complicated, but the SR is perfectly fine if you don’t mind the flat bass).

The Etys use their infamous triple-flange eartips that may not be everyone. Fortunately due to their long nozzle design using foam tips do not alter the frequency response (Innerfidelity has proven this). I personally use Comply foam tips with them. There are also many aftermarket tips that fit them such as the Spinfits CP-800.

Part 2: Full-Sized Headphones

As a brief summary: closed back headphones have better noise isolation, open back leak noise but often have better soundstage.

I haven’t been keeping up in this market segment for years, so it’s up to you to research. Some big name brands in no particular order: Sennheiser, HiFiMan, Audeze, Beyerdynamic, Audio Technica, AKG, Grado, Fostex (LOTS of derivative ‘brands’ from modded T50 series), Focal, Philips, Sony, etc.

One thing to keep in mind is that full-sized headphones are often harder to drive (especially planers, don’t let the low impedance trick you) and require a dedicated amp.

Part 3: DAC/AMP

Note that new products come out every month so you should always do your research.

Also be very careful, other than cables, DAC/AMPs are the most prone to snake oil claims. Many might know the legend NwAvGuy who one day stormed into the scene, created an open-source Objective 2 AMP/DAC design that outperformed competition 10x its price then disappeared without trace. His story is a long one reserved for another day, but thanks to him modern gear are mostly more about objective measurements than subjective claims.

I usually search for an amp that is suitable for 1.5-2x my headphone impedance. This is due to headphones impedance are measured at 1KHz, whereas often the impedance isn’t linear throughout (stares at Senns). (Example: HD650 is rated at 300 ohms but its peak resistance is 500 ohms at 80Hz.) Update: RIP innerfidelity, hopefully someone has that link archived somewhere

Portable DAC/AMP rarely are powerful enough to drive 300 ohm class headphones. Amps that do are probably bad at driving low impedance IEMs (volume matching, output impedance problems, etc.). It’s basically pick your poison and finding you needs.

For IEMs: Your goal is to find an amp that is clean (high SINAD) and quiet (low noise-floor). Try find the lowest output impedance possible using the 1/8 (or 1/10) rule: the output impedance of your amp should be 1/8 or 1/10 (depending on who you ask) of your headphone impedance. Typically this means looking for <=1 ohm for IEMs. Portable amps often work well due to them being battery powered (clean DC power source). Digital DAC volume control is a plus to ensure channel matching.

For high-impedance headphones: Your goal is to find something that does well at high gain with low distortion. Output power calculators (Site 1 & 2) are your friend. You often find people complaining about weak/flabby/distorted bass on such headphones and a weak amp is probably the cause.

Recommendations

Crossed out some devices. I have not been keeping up with the scene now that I’ve found my end-game: the Apple Dongle. Yes, I’m dead serious.

  • Uber Budget: Apple USB-C to 3.5mm Dongle ($10)
    • No, this isn’t a joke. Make sure you get the US version, as the EU version is weaker and doesn’t measure as well due to reasons (EU volume limit shenanigans). These measure insanely well for $10 (99dB SINAD / 113dB SNR), and are perfect for even end-game IEMs due to it being really clean and non-existent noise-floor. Suitable for <50ohms IEMs / non-planar headphones. Can be used on Windows with no problems, although Android users may have volume issues due to a config bug (can be mitigated by using exclusive mode such as USB audio driver app).
  • Bluetooth: EarStudio ES100 ($100) Not usre what’s good these days.
  • Budget Desktop DAC/AMP Combo: FiiO K3 ($100)
    • Designed to replace the infamous E10K. While measurements aren’t the best in 2019, it packs a lot of functionality (Optical, RCA, etc.) and is pretty good for its price for an all-in-one. Also has an actually good 6db bass-boost switch for watching movies (6db bass boost on D-F tuning is close to Harman target’s bass’s 7db boost). Recommended for <150 ohm non-planar Headphones and IEMs.
  • Budget Desktop DAC/AMP Combo (More Power): FiiO K5 Pro ($150)
    • Has a surprisingly good AMP and OK DAC. Can even drive the HD650s well. Basically a much more powerful desktop K3 that can drive ~300ohm / planars with no problem.
  • Budget Portable DAC/AMP Combo: Topping NX4 ($160)
    • Pretty much better measurements than the FiiO K3. Can even drive the HD650s. However, QC isn’t as good as compared to FiiO.
  • Mid-Range Desktop DAC/AMP Combo: Topping DX3Pro ($220)
    • A really good all-in-one unit with good measurements. Not as good as separate DAC AMP units but for functionality (BT, preamp, etc.) and desk space friendliness it is unbeatable. If you get the newer v2 LDAC version, unfortunately its output impedance is 10 ohms so make sure your headphone is >80 ohms (1:8 rule).
  • Mid-Range DAC & AMP units: JDS Atom AMP/DAC ($100/$100), Schiit Heresy ($100), Grace SDAC ($79), Khadas Tone Board ($100), or similar products (e.g. SMSL/Topping DAC/AMPs)
    • These are popular entry-level single units. There are also many good Chinese DACs in the $100 price range, although it might be more of a hassle to acquire one (Aliexpress, warranty issues, etc.).
  • Upper-Range SE DAC/AMP: JDS Element II ($399)
    • The AMP unit is very good (the Atom was derived from this unit during research). The DAC chip is definitely the weakest link in this unit. However, if you need a nice looking DAC/AMP and don’t care for balanced connectors, this is still a very good choice. You’re definitely paying some premium for the looks though.
  • High-End Stuff: I will refrain from recommending anything specific, but here is a random list that might interest you. Always do your own research and never blindly trust strangers.
    • Massdrop THX AAA 789
    • ADI-2-DAC
    • SMSL SP200 THX
    • SMSL SU-8 v2
    • SDAC Balanced
    • Schiit Modius
    • Topping DX7Pro
    • Other brands: THX powered AMPs, Chord, iFi, etc.