mpv Configuration Guide for Watching Videos


I also have a guide for setting up MPC-HC and MadVR if you’re less into tweaking. Other then being slightly less resource efficient at the lower end, imo MPC paired with MadVR is just way more convenient to use.

Table of Contents

These are my recommended settings. Adjust the settings suitable to your system.

Get mpv at or custom mpv build w/ vpy enabled configured by me. Just extract the archive to a desired directory. I recommend not using a directory that requires admin/superuser privileges. For Windows, an install script should be included but is not required, as all it does is register mpv for file association. For Mac, there’s also IINA which has a prettier UI although some advanced options may not work and may require more tweaking for power users.

The mpv.conf file can be found in %APPDATA%\mpv on Windows and ~/.config/mpv/ for Linux/Mac. Alternatively, \portable_config\ folder can be used in the mpv executable directory on Windows to make it portable, and will take highest priority over %APPDATA%\mpv.

Read the documentation for all available commands.

To illustrate how the config file works:

# This is a comment.
active-config # This is a comment.
active-option-value=222 # active-option-value is now set to 222 instead of 111. 

Anything behind a # is disregarded by the program as a comment and is inactive. Anytime you write a new line that is present already, it overrides it. This means if you write scale=ewa_lanczossharp, then in the next line write profile=gpu-hq, the scale=spline36 from the gpu-hq profile will override the settings you set earlier.

If you are still using notepad to edit files, I recommend using VSCode, Notepad++ or similar software to edit config files to make your life easier.

GPU Selection

If you have a laptop, you need to select your dedicated graphics for mpv if you wish to use advanced filters/scalers. I don’t have a laptop by me right now, but the power saving option in the screenshot should be the iGPU while the high performance option should be the dedicated GPU.

You need to add the executable to the list if you don’t see it (mpv.exe).


General mpv Options

Enable this section if you use SVP 4, requires Vapoursynth mpv build.
I also have an SVP 4 guide if you’re interested.

#input-ipc-server=/tmp/mpvsocket # *nix only
input-ipc-server=mpvpipe # Windows only

Built-in high quality profile. Enables better upscaling algorithm. Recommend GPU minimum of newer Intel Iris iGPU, AMD VEGA 8 iGPU, or NVidia MX150. Add this option at the beginning of the file to prevent overriding options!


(Optional) Enabled newer Vulkan API. Disable if compatibility/performance issue. May improve performance when running complex shaders. Default has been updated to use 3D311. Set to opengl if you’re having issues.


Setup proper color profile to prevent oversaturation on high-gamut monitors. Many IPS monitors these days can achieve near DCI-P3 performance and look oversaturated, especially with the reds/greens. (Oversaturation is pretty on nature videos, but makes human skin look odd and gets old real quick.) If you can calibrate the monitor to get an icc profile, use it. If not, you can set a target primary if you know your display color gamut. Only enable 1 option!

icc-profile="C:\Windows\System32\spool\drivers\color\Example.icc" # Use defined icc file
#icc-profile-auto # Use this if your system has a default icc profile setup already.
#target-prim=dci-p3 # Use this if you don't have a calibrated icc profile, but know your monitor's gamut

Override default language selection order:

alang = 'jpn,jp,eng,en'
slang = 'eng,en,enUS' # enUS for Crunchyroll.

Some other UI options:

keep-open=always # Prevents autoplay playlists. Set to 'yes' to autoload. Both "always" and "yes" prevents player from auto closing upon playback complete.
reset-on-next-file=pause # Resumes playback when skip to next file

window-scale=1 # Set video zoom factor. (High DPI screens want 1.5 or even 2)
autofit-larger=1920x1080 # Set max window size. Can also set something like "75%" to mean max window size is 75% of screen height/width
autofit-smaller=858x480 # Set min window size.

no-osd-bar # Hide OSD bar when seeking.
osd-duration=500 # Hide OSD text after x ms.
osd-font='Trebuchet MS'

If you have a custom osc.lua, enable this or you will be playing loading roulette with a different or broken OSD every time you open mpv.

osc=no # Disables default OSD. Use this option ONLY if you have a custom osc.lua

Configure network/youtube-dl options:

ytdl-format=bestvideo[height<=?1080]+bestaudio/best # Set max streaming quality as 1080p.
# Default demuxer is 150/75 MB, note that this uses RAM so set a reasonable amount.
demuxer-max-bytes=150000000 # 150MB, Max pre-load for network streams (1 MiB = 1048576 Bytes).
demuxer-max-back-bytes=75000000 # 75MB, Max loaded video kept after playback.
force-seekable=yes # Force stream to be seekable even if disabled.


screenshot-png-compression=7 # Setting too high may lag the PC on weaker systems. Recommend 3 (weak systems) or 7.

Video Config

Denoise filter. Recommend keeping it off unless watching CRT era stuff. (May not work, filter requires vo=vdpau or vo=gpu. See doc.)


Deband filter. Always turn on for anime. Turned on by default if profile=gpu-hq is set.

deband=yes # Default values are 1:64:16:48

Deband parameters configuration. For Anime, 2:35:20:5 recommended for general use. Use 3:45:25:15 for older DVD, badly mastered BD or WEB streams. Use 4:60:30:30 for really, really bad streams.

deband-iterations=2 # Range 1-16. Higher = better quality but more GPU usage. >5 is redundant.
deband-threshold=35 # Range 0-4096. Deband strength.
deband-range=20 # Range 1-64. Range of deband. Too high may destroy details.
deband-grain=5 # Range 0-4096. Inject grain to cover up bad banding, higher value needed for poor sources.

Note: For older/weaker iGPUs, instead of increasing deband-iterations you may need to increase deband-threshold instead if you need a stronger effect. I recommend 1:60:25:30 if you absolutely must run 1 iteration (lower quality but much less GPU usage).

Set to auto for Anime due to 8 and 10 bit encodes. Set to no if your monitor has built-in dither (just leave it at auto if you aren’t sure).


Audio Config

volume=100 # Set volume to 100% on startup.
volume-max=100 # Set player max vol to 100%.

Subtitle Config

Enable if subs are broken or you need legacy ssa support.


Enable to modify PGS subs.

#sub-gauss=0.5 # Blur PGS subs.
#sub-gray=yes # Monochrome subs (makes yellow font grey).

Allow loading external subs that do not match file name perfectly.


Default subtitle font when none are specified.

sub-font='Trebuchet MS'
sub-bold=yes # Set the font to bold.
#sub-font-size=55 # Set default subtitle size if not specified.

Advanced Video Scaling Config

Internal Scalers

Note: Press shift + I in mpv to view frame drops. Then press 2 to view frame times and processing layers. Make sure your config does not drop frames and ideally frame times should be <25ms.

scale is the luma upscale method. Prioritize resource on this over cscale.
dscale luma downscale method.
cscale chroma upscale method. Human eyes aren’t as sensitive to chroma compared to luma.

If you enabled profile=gpu-hq earlier, these are the options set by it (no need to include these lines again):

#Scaling algorithm set by profile=gpu-hq

Default gpu-hq should be sufficient for most people, however, here are some suggestions for fine tuning:

I have a Toaster (crappy PC) edition:

scale=bilinear # Set spline16 if possible.

I have an iGPU laptop edition:


I have a decent GPU (GTX 1050+) edition:

cscale=spline36 # alternatively ewa_lanczossoft depending on preference

External Shaders

For those who have really good GPUs (GTX 1060+, sometimes need even better GPU) and want to run external shaders:

Remember to download the shader files and put them in the mpv config folder! ~~/xxx.glsl refers to xxx.glsl file in the mpv config directory.

Always enable profile=gpu-hq before using shaders for fallback at the beginning of the file.

DO NOT COMBINE shaders of the same type mindlessly. Just choose one. You’ll bork the image if you run multiple filters, they’re not designed to work together.

Dynamic Scaler: SSSR
SSSR is a dynamic scaler that improves built-in scalers utilizing structure similarity. Upscale result varies widely depending on your scaler. You may freely choose your preferred one. Moderate to highly GPU intensive. This is my personal favorite due to its versatility.

For a sharper image:

glsl-shader="~~/SSimSuperRes.glsl" # Set B C parameter to Robidoux.

Alternatively set to haasnsoft for a softer look (much better at artifact/aliasing supression). I found this combo to be very good for anime and performs close to NN based upscalers.

glsl-shader="~~/SSimSuperRes.glsl" # Set B C parameter to Mitchell.

If you really want to run SSSR on a lower-end GPU, the full power version of MX150 (1D10 25W) should be able to run it if you use the faster algorithms. Results will still be better than spline36. Use bilinear/spline16/spline36 (whichever your GPU can handle without dropping frames) for a sharper image. For anime, if you want a softer anti-aliasing look like haasnsoft, use bicubic.

glsl-shader="~~/SSimSuperRes.glsl" # Set B C parameter to Robidoux for sharper image, else use Mitchell (slightl sharper) or Catrom (default).
scale=bicubic # or bilinear or spline16/36

You can modify the SSSR shader’s cubic B and C parameters (on line 31 and 79) to your needs. Default is Catrom (0,1/2). Set to Mitchell (1/3,1/3) or even Robidoux (0.3782,0.3109) for a sharper image. See this graph.

2x Pre-Scaler: Neural Network Scalers
Some of these shaders require high-end GPUs to run smoothly. 2x pre-scaler means they scale a fixed 2x and should only be used if you have a 4K monitor (1080p x2 = 4K). For NN upscalers, more neurons = better quality but significantly more computational power.
Always remember to set profile=gpu-hq first (at beginning of file).


Choose 1 of the 2x pre-scalers:

  • FSRCNNX is very good for general use / real-life content, may amplify artifacts (due to it being true to its source). Requires good source material. MadVR equivalent would be NGU-Sharp. Very GPU intensive.
  • RAVU Lite is relatively lightweight and decent for anime due to the lite version being sharper. I would use SSSR over this due to SSSR being better imo. Moderately GPU intensive.
  • NNEDI3 is designed for anime use. Result is very close to MadVR’s NGU-AA. Very GPU intensive.


cscale=spline64 # or ewa_lanczossoft (or your choice, really)

Dynamic Scaler for Anime: Anime4K

Do NOT mix Anime4K with other shaders/scalers.

Anime4K is a unique upscaler + shader designed for Anime. It used to be very destructive, sharpens lines for a… unique “visual quality” look. It works extremely well in some anime and eh in others. Definitely isn’t for everyone. Version 0.9 had lots of flaws (mainly texture banding and line bloating). 1.0RC fixes a lot of the flaws (mainly bad banding) at the cost of doubling GPU consumption. 1.0RC2 optimized GPU usage (even more effecient than 0.9, 1080p > 4K should just barely run on a GTX 1030) and also introduced 2 more speed profiles for even weaker systems (supposedly the fastest profile can run on iGPUs at the cost of shader quality).

The latest 1.0RC2 tones down the “filtered look” even more and serves as an upscaler + filter of {line thinning + line smoothening + sharpening} + artifact/texture-denoiser (reduces jpeg-like edges) combo (note: filter only active when upscaling). One down side to Anime4K compared to other upscalers is that due to line thinning, anime with bad aliasing effects looks even worse (stares at badly made isekai anime magic circles).

Latest Anime4K have multiple glsl files that can be mixed and matched. See documentation for how to set it up for mpv.

# See Anime4K documentation, setup has changed a lot since the 1.0 days.
# Following is an example for v3.1 setup for 1080p scaling + post-processing

For those interested in higher quality downscale:
SSIM downscaler is tuned to be used with mitchell. However, you shouldn’t be downscaling in the first place so just having dscale=mitchell fallback should be good enough.


For those inteested in higher quality chroma upscale:
Imo cscale=spline36 is already very good. Human eyes aren’t that sensitive to Chroma compared to Luma. I won’t be surprised if you can’t even tell the difference when simply using cscale=mitchell. However, if you just want the best of the best and have the extra processing power, KrigBilateral is a high quality Chroma upscaler.


I always advise testing out settings yourself, “quality” is extremely subjective. For example, in anime, destructive filters (denoise/de-ring/sharpen) upscalers may “look better” visually but not necessarily be true to its source (lower SSIM/PSNR).

If we go purely by measurement, haasnsoft should be a terrible upscaler (and it kinda is with normal content). However, when paired properly with SSSR it looks great in anime. Similarly, NNEDI3 looks better with Anime (as it’s trained for such) despite FSRCNNX is the technically better (accurate) upscaler.

Anime “visual quality” greatly favors algorithms that de-aliases, denoises and sharpens lines to be defined. This is why Anime4K is such a controversial upscaler as it takes these concepts to an extreme.

In terms of algorithm quality (Note: better quality usually = more GPU usage):

abc = abc built-in shader
[xyz] = xyz external shader (requires downloading glsl/hook files)

For scale:
[NNEDI3] or [FRCNNX] > [SSimSuperRes + haasnsoft/ewa_lanczossharp] > ewa_lanczossharp > spline36 > spline16 > bilinear
                <------------------------[Anime4K] (personal preference)------------------------->

For dscale:
[SSimDownscaler] > Mitchell

For cscale:
[KrigBilateral] > spline64 / ewa_lanczossoft / spline36 > mitchell

Upscalers Demo:
Open in new tab to view full size.
2x upscale from 100×54 to 200×108 image (Source: DanMachi).
NNS = nearest neighbor (also interger upscale in this case, essentially original image). mpv default is bilinear. Anime4K shown is older 1.0RC ver.

Personal mpv.conf file

Note: I’ve since upgraded my system, current sample file is tailored towards a mid-end desktop with a DCI-P3 monitor intended to be used with SVP. Please modify the file to your needs and not blindly copy-paste.

###--- Enable this section only if you use SVP 4, requires Vapoursynth mpv build ---###
#input-ipc-server=/tmp/mpvsocket # *nix only
input-ipc-server=mpvpipe # Windows only
###--- End of SVP Section ---###


#gpu-api=vulkan #Default is now updated to use D3D11 which works better with my current setup.

target-prim=dci-p3 # Clamp video color to display correctly on my DCI-P3 monitor.

# UI/Behavior setup.
alang = 'jpn,jp,eng,en'
slang = 'eng,en,enUS'
autofit-larger=75% #Set Resolution (HorizontalxVertical or Screen%, i.e. 1280x720 or 50%)
osd-font='Trebuchet MS'

#osc=no # Hide OSC. Use this option ONLY if you have a custom osc.lua

# Configure network/youtube-dl options:
ytdl-format=bestvideo[height<=?1080]+bestaudio/best # Set max streaming quality as 1080p.
demuxer-max-bytes=150000000 # Max pre-load for network streams (1 MiB = 1048576 Bytes).
demuxer-max-back-bytes=75000000 # Max loaded video kept after playback.

# Screenshot:
screenshot-png-compression=3 # Setting too high may lag the PC.

# Deband filter. Always turn on for anime.
deband=yes # Default values are 1:64:16:48

# Deband parameters configuration.
deband-iterations=2 # Range 1-16.
deband-threshold=35 # Range 0-4096.
deband-range=20 # Range 1-64.
deband-grain=5 # Range 0-4096.




glsl-shader="~~/shaders/KrigBilateral.glsl" # High quality chroma upscaler.

input.conf Config

You may create an input.conf in the same directory as mpvconf to override default shortcuts. Here is a cheatsheet for the default shortcuts:


Keep in mind depending on your country keyboard, you may need to modify the file accordingly. For example, the key = has been ‘incorrectly’ mapped as + for default shortcuts (e.g. alt + = for zoom is mapped as alt + + which won’t work on the US keyboard). This is because in some countries the = key is the + key, and while on the US keyboard they are the same key the + is actually shift + =.

Basically when mapping, a means pressing a, but A means shift + a.

To fix the incorrect zoom shortcut:

Alt+- add video-zoom -0.1
Alt+= add video-zoom 0.1

To modify skip durations and add custom skip shortcuts:
Note that keyframe seeking is much faster (no render lag) but may not be exactly x seconds.

# Seek 5s exact, do not display OSD.
RIGHT no-osd seek  5 exact
LEFT  no-osd seek  -5 exact
# Seek 5s to the closest keyframe.
Ctrl+RIGHT  seek  5
Ctrl+LEFT   seek -5
# Seek 20s exact.
alt+RIGHT no-osd seek 20 exact
alt+LEFT no-osd seek -20 exact

Modify mouse wheel to control volume:

WHEEL_UP add volume 10 # In volume %.
WHEEL_DOWN add volume -10

Add audio delay hotkey:

ctrl+= add audio-delay 0.1 # In seconds.
ctrl+- add audio-delay -0.1

Add subtitle delay:

. add sub-delay +0.042 # 0.042s is 1 frame for a 24fps video
, add sub-delay -0.042

I came from MPC-HC so I remapped many shortcuts to the same keys.
Add full-screen shortcut like that in MPC-HC:

F11 cycle fullscreen

Add screenshot shortcut like that in MPC-HC:

F5 screenshot video # Video stream screenshot (extract video frame).
shift+F5 screenshot # File stream screenshot (video frame + render subtitles/signs)
ctrl+F5 screenshot window # Window screenshot (screenshot current player frame including OSD, shaders, upscale, etc.)

Cycle subtitle & audio like in MPC-HC:

s cycle sub
S cycle sub down # shift + s cycle backwards
a cycle audio

Cycle ASS subs style override:

u cycle-values sub-ass-override "yes" "force" "strip" "no"
ctrl+, add sub-scale -0.05 # Decrease sub size by 5%
ctrl+. add sub-scale 0.05 # Increase sub size by 5%

Cycle adaptive sharpen shader:
Note: Requires AdapSharp glsl in mpv conf directory. When AdapSharp is active sigmoid-upscaling needs to be no.

g change-list glsl-shaders toggle "~~/adaptive-sharpen.glsl"; cycle-values sigmoid-upscaling "no" "yes"; show-text "glsl-shaders='${glsl-shaders}'\nsigmoid-upscaling=${sigmoid-upscaling}"

Cycle deband on/off and weak/medium/strong:
Note: the ‘weak’ first set of deband values here needs to match the values set in mpv.conf or the value order will be messed up (in this specific case 2:35:20:5).

h cycle-values deband "yes" "no"
H cycle-values deband-iterations "2" "3" "4" ; cycle-values deband-threshold "35" "45" "60" ; cycle-values deband-range "20" "25" "30" ; cycle-values deband-grain "5" "15" "30" ; show-text "Deband: ${deband-iterations}:${deband-threshold}:${deband-range}:${deband-grain}" 1000

Visit the wiki or the default input.conf for more info.

mpv Custom Build

Here is a custom built mpv with Vapoursynth enabled (for Windows). Also includes a custom built FFmpeg in the folder. The folder is portable, and the settings are also in the portable folder. The OSC is also custom, delete osc lua and conf files if you want the default back.

Note: \portable_config\ folder takes priority over %APPDATA%\mpv folder. (i.e. \portable_config\mpv.conf will load instead of ~~\mpv.conf.)

Download the 7z or zip archive.

UPDATE: Newer mpv build 7z archive, includes newest Anime4K stuff. HOWEVER, the newer builds post v29 have an OSC bug where it fails to load my custom OSC half the time. If this annoys you, delete ~~\scripts\osc.lua and \script-opts\osc.conf and just use the default UI. I also included the older mpv.exe renamed as mpv.exe.old if you want the older stable executable back, though the newer version should offer better compatability with newer formats such as AV1 and HDR stuff. Important EDIT: Adding osc=no in the config file completely fixes this issue by forcing the original osc to be disabled, thus successfully loading the custom osc.lua. This line is not present in the current provided download so you will need to add it yourself.

Update 2: mpv build. Same as last update but with updated binary/ytdl from shinchiro and no-osc fix.

Anime Encoding Guide for x265 (HEVC) & AAC/OPUS (and Why to Never Use FLAC)

Table of Contents

x265 Settings Guide

All my encoding parameters are tuned for Anime at 1080p. TL;DR at the end.

Why x265?

x265 is a library for encoding video into High Efficiency Video Coding (HEVC / H.265) video compression format. While developed as a replacement for H.264, due to early performance and licensing issues, it didn’t gain much traction, especially as a streaming format. While most devices now support HEVC in some form or another, adoption didn’t really kick off. Despite these downfalls, HEVC does compress much better in higher crf (lower bitrate) and has been a good codec to use since 2018. Most modern devices support hardware decoding (iOS, Android, laptops, Macs, etc.), and even at 1080p most older consumer PCs should be able to software decode just fine.

In terms of efficiency as of right now (2019/2020), when properly encoded, HEVC could save ~25% bandwidth minimum compared to AVC (H.264, with its encoder known as x264) from my observations. HEVC is about the same quality as VP9, a codec by Google. HEVC does have a slight advantage in terms of parallel encoding efficiency, though they are both just as slow when encoding compared to x264. VP9 today is mainly used by Google (YouTube) via the webm/DASH format, which Apple refuses to support on iOS devices which is probably why it’s not widespread. Update: Apple now somewhat supports VP9 on some devices for 4K streaming (iOS 14+).

What I’m more excited about is AV1. AV1 is a next gen codec with members from many tech giants (Apple, Google, MS, etc.) to create a royalty free codec. IIRC some of its baseline is based on VP10, which Google scrapped and had its codebase donated to develop AV1. Unfortunately the reference encoder libaom is still in early stages of development and is rather inefficient. SVT-AV1 is Intel and Netflix’s scalable implementation. It performs (quality-wise) not as well as libaom, but quality is still better than x265 (SSIM), and encode times are much more reasonable. For now, they look promising, and I am excited to see results in a few years. HEVC took 3-4 years before it was more widely accepted by Anime encoders, and took 5 years until 2018 before I began experimenting with it. AV1 began major development in mid-2018 so by that logic we got 2-3 more years to go.

What tools do I use?

Handbrake is a great tool for beginners. It allows reading Blu-ray disks (unencrypted), and has a pretty UI to deal with. While it runs on an 8-bit pipeline, this isn’t an issue for Anime. Hopefully 10-bit pipeline will be supported once HDR becomes common. (Update: 1.4.0 is still under development but should be updated to use a proper 10-bit pipeline for HDR.) If you want FDK-AAC, you will need a custom compiled version. Guides on how to do this are easily found.

Other useful stuff for dealing with Anime (research them yourself):

  • MKVToolNix (MKV merge & Extract)
  • FFmpeg
  • jmkvpropedit
  • Notepad++ or similar text editor
  • Sushi
  • Batch scripting (.bat files)

Which x265 encoder? 8-bit, 10-bit or 12-bit?

To keep things short and not get into technical details: use the 10-bit encoder (Main 10 profile).

10-bit produces slightly smaller files while preventing banding from 8-bit compressions (this isn’t a joke, quantization and linear algebra is a mysterious thing).

Why not just use 12-bit then you say? Well, to put simply: moving form 8 to 10 bit increases color gradient available from 256 to 1024 to eliminate banding, but going to 4048 in 12-bit really isn’t noticeable. In fact, it is much less supported, and from my tests back in 2018 the 12-bit encoder is actually worse than the 10-bit encoder at high crf likely due to less resources put into developing it.

What preset should I use?

First, one must understand x265 is fundamentally different than x264. In x265, the slower the preset, the bigger the file size is at the same crf. While counter intuitive at first, this is due to the more complex algorithms used to more precisely estimate motion and preserve details. To put into perspective how important presets are in x265, a clip encoded at crf=16 preset=medium is actually worse in quality than crf=18 with preset=slow, while also being 50% larger in file size. One may argue SSIM isn’t the best representation as a quality metric, but the matter of fact is that it’s an objective measurement readily available. Subjective tests I’ve done (using Kimi no Na Wa & NGNL Zero) also agree with the above statement. The move from fast to medium to slow each reduces bad noise and artifacts, though I do have to admit going any slower I could not observe any significant improvements (x265 3.1 has a revised veryslow preset that changes this).

Now to answer the questions which preset to use. In my mind, there are only 3 presets worth using: fast, slow, and veryslow.

fast preset, which the quality is pretty “eh”, is the first “sweet spot” from the list and does serve a purpose for people with really weak systems. It is barely any slower than the others while offering slightly better quality. However, it is rather lackluster in my opinion and should never used to encode anime, especially those with dark/complex scene.

slow is the second sweet spot as you could see from the graph below, and should be the preset most people should use. Yes, I know it isn’t the fastest at encoding videos (typically ~10fps for a modern 6-core desktop processor @1080p), but it does offer superior quality especially with fast moving objects in dark scene.

Normally I wouldn’t recommend veryslow. It does, however, have its place, especially with the recent changes in v3.1. The veryslow preset is very useful at higher crf values (22+), with much better motion estimation at low bitrates. The only downside to veryslow is its encoding speed, which is a gazillion times slower than slow and requires a supercomputer. At lower crf values you barely get any improvements and thus could be ignored.

Presets vs. SSIM at 4K (source):


Include 1% low & Encode time:

Presets vs. file bitrate at 4K
preset size

Bonus conversation: Due to x264 being much less complex, presets can pretty much maintain only a slight loss in quality as you encode faster. This gives an illusion that slower presets are slower due to spending more time compressing. The more correct way to see this is that slower presets are slower due to doing more motion calculations and finding the best scheme that best describes the frame, which in x264 just so happens benefits compression too. However, x265 is waaay more complex with motion algorithms, meaning that accurately describing motion actually increases bitrate.

x265 Encoding Efficiency

For mainstream systems, just let x265 handle it automatically. For more advanced encoders that may have beefier CPUs, or even servers, this section may interest you.

x265 heavily favors real cores over threads, so keep that in mind if you use programs like process lasso. Theoretically, a 1080p encode with a default CTU size of 64 has an encode parallelization cap of 1080/64 = 16.875 threads. Beyond that x265 will not scale linearly. You could lower CTU size to 32, but you lose some compression efficiency (~1-5% depending on source complexity, personal tests show ~1-2% for anime @crf=19), especially with anime where CTU of 64 actually does benefit.

For those interested, x265 heavily utilizes AVX2 instructions, which runs on 256-bit FPU for optimal speed. Keep this in mind when researching CPU choices.

The more threads you add into a pool, the more encode overhead you will experience, since every row of encode requires the upper right CTU block to complete before it can proceed. When the upper right block is more complex and slows down, the next row has to wait. More rows = more possibility of waiting to happen. This translates to about 30-50% efficiency when encoding with additional threads, with the value gradually lowering the more threads you have. Learn more about frame threading.

According to someone’s 128 core Azure VM test on a 4K footage w/ preset=veryslow, encode scales pretty much linearly to 32 cores, and a bit dodgy at 64 cores. If we translate this to 1080p in theory scaling should be good up to 16 cores, which agrees with our earlier theoretical calculation of 16.8 threads.

Earlier we determined the theoretical limit of 1080p is about 16 threads. While at first glance an 8 core CPU should be the limit, remember that x265 favors real cores over threads. This means on a 16 core CPU, each encoding thread gets a real core to run on, not to mention there are also other processes in the encode chain that could use the extra threads. Due to x265 encode threads being terrible at sharing a core you still get good efficiency. Realistically speaking, if you really want to peg your 16 core CPU at 100%, I would run 2 instances of 16 thread encodes (you gain maybe ~5-10% efficiency), or lower the CTU to 32 to increase the theoretical limit to 33 threads. You can control thread count with the --pools option.

Those with CPUs that have multiple NUMA domains, look at –pools options on how to set x265 to run on a single NUMA node, then run as many instances as NUMA nodes you have with each instance running on each NUMA.

If the link to x265 documentation is broken, manually search it up. The link changes all the time when they update x265.

Encoding Parameters

Now that we established we should always use preset=slow, let us look at parameters that you may want to use/override to improve quality. For test clips, I recommend NGNL Zero and Tensei Shitara Slime episode 1 as they represent pretty much the worse case scenario for encoding anime (lots of dark scene, fiery effects, glow effects, floating particles, etc.).

QP, crf and qcomp

Note: x265 also has ABR (average bitrate) and 2-pass ABR encoding mode which I won’t get into. As a quick summary: never use ABR, and only use 2-pass ABR if you absolutely must have a predictable output file size. 2-pass ABR will be identical to crf assuming the result file size are identical and no advanced modifications are made to crf (e.g. qcomp).

Before we begin, we need to understand the basics of how x265 works. QP, a.k.a. quantization parameter controls the quantization of each macroblock in a frame. In QP encoding mode (qp=<0..51>), QP is constant throughout, and each macroblock is quantized (compressed) the set QP target. I do NOT recommend using QP encoding mode, which I will explain why in a bit.

Keyword: Quantization – lossy compression achieved by compressing a range of values to a single value. Higher quantization (QP) = more compression.

crf (crf=<0..51>), known as constant rate factor, encodes the video to a set “visual” quality. Keyword: Visual. The major difference between crf and QP is that QP encoding mode has a CQP (constant quantization parameter) whereas crf uses the QP as a baseline, but varies QP based on perceived quality by human eyes. Essentially crf can more smartly distribute bitrate to where it visually matters as opposed to QP encoding mode where it quantizes (compresses) constantly (constantly in terms of math, not to the eyes). For example, crf will increase QP in motion scene due to motion masking imperfections, while decrease QP in static scene where our eyes are more sensitive. Additionally, in crf encoding mode, QP can be further manipulated with AQ and PSY options (discussed later).

The obvious downside to QP encoding mode, and especially crf is that it is almost impossible to determine the output size, particularly when modifying options that manipulate QP may result in drastic file size differences. However, as opposed to 2-pass ABR, crf guarantees that no matter which episodes you encode, they will all be at the same visual quality across.

One should always encode their own test clips and determine what crf they prefer and can accept in terms of size vs. quality loss. However, as a general guide (personal opinion):

  • I have a small laptop / I watch on TV / 21″ monitor: crf=20-23
  • I have a 24-27″ monitor: crf=18-21 (crf=18 is my lowest recommended value for Anime, note that going below crf=18 may increase file size quite rapidly.)
  • I have a 4K 27″ monitor/TV in my face and I want minimal artifacts: crf=16-18
  • I have a 4K 27″ monitor/TV in my face and I determine video quality by pausing the video and using a magnifying glass (lol): crf=14

Jokes aside, video quality should be assessed by watching, and not by pausing the video. If you can’t see the flaw without pausing the video, is it really a flaw after all? 4K requires higher crf due to upscaling often amplifying artifacts, and high quality upscalers (especially ones like FSRCNN) often benefit from higher quality source.

The variability of crf can be manipulated with qcomp (quantizer curve compression factor; qcomp=<0..1>), but I recommend leaving it alone at default qcomp=0.6. qcomp isn’t as important of a variable as it was in x264 (since x264’s aq-mode=2 & 3 are pretty much broken). In terms of crf encoding mode, high qcomp leads to more aggressive QP reduction (higher bitrate) for complex scene. aq settings also affect qcomp somewhat. crf in x265 is somewhat a pain to tune and confusing to beginners due to tons of settings being intertwined controlling bitrate and quantization.

However, if you are encoding a source where it is mainly either simple scene or complicated motion, you can try increasing qcomp. Somewhere around qcomp=0.8 should be sufficient for even the most extreme cases. One interesting strategy to use for such source is to increase crf by 1 and use qcomp=0.8. This results in similar file size, but complex motion scene essentially gets allocated more bitrate than static portions. Beware of doing this to “average anime” as combining this with other quantization options (AQ, psy) incorrectly may lead to bitrate starving normal scene.

Now before I get to other parameters, it is possible to tune a crf=20 video to be better than one at crf=18. Raising crf isn’t end-all-be-all solution to everything so make sure to read the following sections and do encode tests yourself.


Allowed values are <0..16>. For anime just use 8. 8 has minor savings (~3-5%) over default 4 with a small encode time penalty (~5%), 16 is pretty much only useful for static images (BD Menu), and the encode penalty isn’t worth the saved space (1.6% smaller than 8).

For a crf=20 episode (~200-400MB/ep), expect about a few MB savings per anime episode going from 4 to 8. While at first glance it isn’t worth it, on the higher end (static-ish low-fps slice of life) you might be able to save 15-30 MB per episode (~5-10% savings on the very extreme end).

If you don’t want to use 8, 6 should be the sweet spot since around 5-6 is where consecutive b-frames drop to single digit percentages for most anime.

You can also encode an episode with bframes=16 and look at the encode log to optimize for your content.

Example encode log with bframes=16, values represent consecutive b-frames percent from 0-16 (notice the sudden drop at 6 b-frames and another drop to decimals at ~8+ b-frames):

x265 [info]: consecutive B-frames: 18.8% 10.2% 19.2% 12.4% 8.3% 15.9% 4.7% 3.1% 2.9% 0.8% 1.2% 1.7% 0.3% 0.2% 0.1% 0.1% 0.2%

bframe=n % file size of bframe=0 (Rokudenashi ep1 crf=19/17):
x265 bframe


According to legend the more the better. Consensus is that optimally you should use ref=6 for 10-bit anime encodes. x265 allows values of <1..16>, although 8 is the the “true maximum” x265 can currently use and any more actually doesn’t improve quality.

A study from late 2018 showed that going from ref=1 > ref=5 > ref=10 > ref=16 improved quality by 0 > 0.04 > 0.13 > 0.13 PSNR @720p with 100% > 135% > 179% > 223% encode time penalty. Note that 10 to 16 changed nothing due to true max capped at 8. For reference, we measure the difference of presets at the 0.x magnitude for PSNR.

Optimally you should use ref=6, although I personally stick with the default preset=slow‘s ref=4. Note that ref=6 is the max you can go if you enabled b-frames and --b-pyramid. Newer versions of x265 also blocks non-conforming values.

Loop Filters: sao, limit-sao, no-sao and Deblock

SAO is the Sample Adaptive Offset loop filter. SAO tends to lose sharpness on tiny details, but improves visual quality by preventing artifacts from forming by smoothening/blending. I would leave this on for crf>=20 (sao). crf=18 it depends on personal taste and anime, most of the time I set it to limit-sao. I would not turn off sao (no-sao) with crf any higher than crf=16 unless you are trying to preserve extremely fine grain/detail.

deblock specifies deblocking strength offsets. I tend to leave it at default deblock=0:0 or deblock=1:1. If you want to preserve more grain/detail, you may set it to -1:-1. Please note in FFmpeg based programs you will need to type deblock=0,0 to pass the values, as : is a parameter separator.

Psycho-Visual Options: psy-rd and psy-rdoq

Traditionally, the encoder tends to favor blurred reconstructed blocks over blocks which have wrong motion. The human eye generally prefers the wrong motion over the blur. Psycho-visual options combat this. While technically less “correct”, which is why they are disabled for research purposes, they should be enabled for content intended for “human eyes”.

psy-rd will add an extra cost to reconstructed blocks which do not match the visual energy of the source block. In laymen’s terms, it throws in extra bits to blocks in a frame that are more complex. Higher strength = favor energy over blur & more aggressively ignore rate distortion. Too high will introduce visual artifacts and increase bitrate & quantization drastically.

psy-rdoq will adjust the distortion cost used in rate-distortion optimized quantization (RDO quant). Higher strength will also prevent psy-rd from blurring frames.

If you didn’t understand any of that, don’t worry. Basically, these 2 options are crucial to QP manipulation and grain/detail preservation. psy-rd will decide the tendency to add extra cost (bitrate) to match source visual energy (i.e. grain, etc.) and psy-rdoq will restrict blur favoring higher energy. Too low and details will be blurred to improve compression (the reason why people hated x265 in the early days), too high and you create artificial noise and artifacts. Note that psy-rdoq is less accurate than psy-rd, it is biasing towards energy in general while psy-rd biases towards the energy of the source image.

For anime, use psy-rd=1. On anime with some grain/snow/particles, or lots of detailed dark scene (often anime movies), set psy-rd=1.5 (e.g. Kimetsu no Yaiba). If grain is a main feature, or the whole series is dark, well mastered with details you may use psy-rd=2 (e.g. NGNL Zero has lots of fallout dust and complex details throughout the whole movie).

psy-rdoq is the key to preserve grain (and also quite aggressively lowers QP). Keep in mind the --tune grain x265 built-in actually has too high of a value for slower presets, as it actually artificially creates even more grain. For anime, I would leave it at default psy-rdoq=1. With some grain/CRT TV effects, I would set it to psy-rdoq=2 or 3 depending on how strong the effects are. For anime where grain effects are staple throughout, or to eliminate blocking in complex fast motion scene at lower crf (<16) you may increase the value to 45. Additionally, some anime use grain to prevent blocking/banding and may also need a higher value to prevent micro-banding. Note that these values apply to preset=slow. A higher value may be needed for faster presets.

Keep in mind both these options drastically increase file size, but also improve visual quality. On lower bitrate encodes, having too high of psy-rd may starve bitrate from flat blocks, and too high psy-rdoq may also create artifacts.

I recommend psy-rd=1 and psy-rdoq=1 for most of the anime out there. I sometimes use psy-rd=1.5 and rarely ever go to 2. I rarely use psy-rdoq >2 due to how much bitrate it increases, and due to it technically being less accurate has it induces noise from quantization decisions unlike rd which induces noise based on the source. (To me, its not worth increasing the value to make that 10 second grainy scene look better. I only increase it when the whole show/movie has grains/grain-like objects).

Read more about it on

Note: Lowering pbratio and ipratio may also improve grain retention (more “real” frames over b/p frames), although I do not recommend touching them.

Adaptive Quantization Options: aq-mode and aq-strength

aq-mode sets the Adaptive Quantization operating mode. Raise or lower per-block quantization based on complexity analysis of the source image. The more complex the block, the more quantization is used. This offsets the tendency of the encoder to spend too many bits on complex areas and not enough in flat areas.

As this is beneficial for anime, you pretty much want this enabled. As for the modes:

  • 0: disabled
  • 1: AQ enabled
  • 2: AQ enabled with auto-variance (default)
  • 3: AQ enabled with auto-variance and bias to dark scenes. This is recommended for 8-bit encodes or low-bitrate 10-bit encodes, to prevent color banding/blocking.
  • 4: AQ enabled with auto-variance and edge information.

I highly recommend, in fact I think it pretty much is a must to use aq-mode=3 for anime. It raises bitrate in dark scene to prevent banding. Seasoned encoders will know dark scene with colorful glowing effects (i.e. fire) and dark walls with subtle colors are most prone to banding, blocking, and color artifacts. Setting aq-mode=3 is so beneficial to anime that a crf=20 encode with it looks better than a crf=18 encode without while having similar file size.

aq-strength is the strength of the adaptive quantization offsets. Default is 1 (no offset). Higher = tendency to spend more bits on flatter areas, vice versa. Setting <1 in crf mode decreases overall file bitrate and reduce spending bitrate on plain areas (but potentially introduce blocking/banding in higher crf). For anime, anywhere from aq-strength=0.7 to aq-strength=1 is acceptable depending on the show. I tend to leave it at default unless I feel that aq-mode is spending too much bits. Setting this high sometimes helps with grain preservation, but very expensive bitrate wise and may cause halo artifacts.

aq-motion and hevc-aq are experimental features that are still broken but should be interesting to use in the future (unless AV1 beats it to the punch). From my tests hevc-aq lowers overall VMAF slightly but increases 1% low.


Prevents bilinear interpolation of 32×32 blocks. Prevents blur but may introduce bad blocking at higher crf. I do not use this option on my encodes. For non-anime stuff, this option may help preserve small details from blurring such as hair. I recommend not using this option unless you have no-sao and low crf as sao has a bigger impact on blur.


Disable analysis of rectangular motion partitions. rect is enabled for presets lower than slow. Enabling rect may help improve blocking in challenging scenes. For preset=slow, disabling saves ~25% encode time at the cost of 1-3% compress efficiency.

I recommend not touching rect as this is the main difference between preset medium and slow, unless you really want to save the encode time. Do note that your video quality will decrease ever so slightly.


I advocate to always encode directly from the blu-ray disk as you avoid re-encoding. Re-encoding (or to be more technical: generation loss) is very destructive for video quality, even more so than re-encoding audio.

However, not everyone could afford blu-rays and rip them manually. If you are forced to re-encode (i.e. you got your video files from cough), ensure you have the highest quality encode possible and enable constrained-intra to prevent propagation of reference errors. For re-encodes I would not go below crf=20 as any lower simply isn’t worth it.

Keep in mind this isn’t a magic parameter to remove artifacting from re-encodes. Edges, especially edges close to each other (e.g. hair) tend to have jpeg-like artifacts in between. Dark scene also suffer (since they’re usually the most artifact prone in an anime encode). Any artifacts from the source will also likely be amplified.

Note: Very rarely, but happened once to me, constrained-intra might cause encoding errors that look similar to missing p-frame data.


Number of concurrently encoded frames. Set frame-threads=1 for theoretical best quality and ever so slightly better compression. If you have <=4 core CPU you may consider this option. High core count systems will suffer greatly (encode speed) if set to 1. I found really no quality loss setting it to >1. As for the “max” to set before losing quality, from my test setting it from 2 to 16 yielded identical results to each other contrary to claims that setting it >3 hurts quality. Basically just let your system handle this value unless you really want to encode with frame-threads=1.

x265’s Biggest Flaw: Grain, Micro-Banding, and Grain-Blocking

As we discussed, x265 has a tendency to blur/smoothen to save bitrate. While this can be mitigated somewhat with psycho-visual options, encoders should be aware of what I call the “Miro-Banding” and “Grain-Blocking” phenomenon. As we know, banding in x265 mostly occurs in dark scene. To combat this, many studios are starting to inject dynamic grain to prevent this in AVC 8-bit BD encodes (Increasingly prevalent post 2018/2019). While extremely effective due to BD’s having very high bitrates, this is actually detrimental to higher crf x265 encodes. On a smoky/fiery grainy scene, x265 tends to smoothen out each block creating weird “patches” of regional grainy block (I call this “grain-blocking”, though do note it isn’t “blocking” per se and more of “regional encode-block” color difference that still has a smooth gradient across). For example, the intro scene in ep1 of Tensei Slime exhibits this problem with smoke and fire effects.

x265 also introduces micro-banding when smoothening out the bright objects in the dark with glow effects, and is even more noticeable (slightly wider) when the source has dynamic grain. These micro-bands aren’t conventional banding, but extremely thin bands that only appear in dark scene objects with strong color gradient change (e.g. edges of fire where color rapidly changes from white to red to yellow glowing then to dark grey in a short distance, or a glowing katana swinging sword effect). Micro-banding are a bigger pain as even stronger deband filters cannot smoothen it out during post-processing (i.e. when watching), and unlike grain-blocking where tuning psy values can easily fix it, micro-banding requires a much lower crf value on top of that to suppress. Luckily, micro-banding is much more rare in encodes and that few seconds in a movie is unlikely to harm viewing experience.

There is no simple answer to fix these 2 problems due to crf targets. Traditionally in x264 such scene will simply end up in blocking artifacts. x265 chooses to eliminate artifacts at the cost of detail loss. The down side is that even at lower crf targets it is tough to eliminate x265’s tendency to blur. To truly eliminate such effects, you will first need no-sao:no-strong-intra-smoothing:deblock=-1,-1 to make x265 behave more like x264, then raise psy-rd and psy-rdoq accordingly (2 and 5 respectively should do the trick). However, this reintroduces unpleasant artifacts x265 aimed to eliminate in the first place, thus I do not recommend such encode parameters unless encoding crf <16 (in which case file sizes are so big just use x264, why even bother x265?).

Image set to greyscale, with color/gamma corrections to amplify the artifacts. Micro-banding is self-explanatory. For grain-blocking, you can see each block still retains the grains, but the average color for each block is different creating a “regional” grain-block effect.


Unfortunetly I am not familiar with AVS and VPY scripting and cannot give advises on how to do it. However, x265 benefits greatly from filtering and can avoid its flaws with proper filters, such as proper denoise with masking and custom deband shaders tailored to different encodes. x265 also has a tendancy to magnify aliasing so AA scripts should benefit encodes too.

TL;DR Summary for x265 Encode Settings

Set preset=slow. Then choose 1 following to override the default parameters. These are my recommended settings, feel free to tune them.

  • 1 Setting to rule them all:
    • crf=19, limit-sao:bframes=8:psy-rd=1:aq-mode=3 (higher bitrate)
    • crf=20-23, bframes=8:psy-rd=1:aq-mode=3 (more compression)
  • Flat, slow anime (slice of life, everything is well lit):
    • crf=19-22, bframes=8:psy-rd=1:aq-mode=3:aq-strength=0.8:deblock=1,1
  • Some dark scene, some battle scene (shonen, historical, etc.):
    • crf=18-20 (motion + fancy & detailed FX), limit-sao:bframes=8:psy-rd=1.5:psy-rdoq=2:aq-mode=3
    • crf=19-22 (non-complex, motion only alternative), bframes=8:psy-rd=1:psy-rdoq=1:aq-mode=3:qcomp=0.8
  • Movie-tier dark scene, complex grain/detail, and BDs with dynamic-grain injected debanding:
    • crf=16-18, no-sao:bframes=8:psy-rd=1.5:psy-rdoq=<2-5>:aq-mode=3:ref=6 (rdoq 2 to 5 depending on content)
  • I have infinite storage, a supercomputer, and I want details:
    • preset=veryslow, crf=14, no-sao:no-strong-intra-smoothing:bframes=8:psy-rd=2:psy-rdoq=<1-5>:aq-mode=3:deblock=-1,-1:ref=6 (rdoq 1 to 5 depending on content)

Side note: If you want x265 to behave similarly to x264, use these: no-sao:no-strong-intra-smoothing:deblock=-1,-1. Your result video will be very similar to x264, including all its flaws (blocking behavior, etc.).

Audio Codecs Guide

Why you should never use FLAC

Note: If your source isn’t FLAC/WAV/PCM, always passthrough (-c:a copy) to prevent generation loss.

FLAC, to put it simply, is very inefficient use of data. A typical anime (23-24min) episode will have a FLAC audio size of 250MB. Now compare to an AAC track of merely 20-30MB with basically no quality loss. And then there’s the issue of anime audio track being mastered well in the first place…

The preserving audio quality argument has always been the dumbest argument I’ve ever seen. I feel that this is partly due to the “audiophile” community blowing this issue out of proportion. Just yesterday I was on cough and saw a 24 episode cough that is 12GB large. The encoder argues that at this size the video had “barely any quality loss”. It was a re-encode at crf=22.5, with zero encoder tuning and preset at fast. Needless to say, it was blocking and artifacts galore. The icing on the cake? The audio was FLAC “to preserve audio quality”… I’m pretty sure there are more people in this world with 1080p screens than high-end headphones lol. The biggest mistake really wasn’t encoding with crf=22.5. Imo, it was the really imbalanced release. By simply using AAC you can allocate extra 4GB towards video quality with 99.9% people not notice any audio quality loss.

Note: The only exception to using lossless codecs is:

  1. You are in production using lossless codecs to prevent generation loss.
  2. Archival.
  3. Providing remux/RAWs for cough.

Being an audio enthusiast myself, I am a firm believer of double blind ABX testing, and I encourage people to find the lowest bitrate they can go before being able to discern the difference with this method.

I later provide audio samples (see Encoders Comparison section) from various encoders for people to listen. A great ABX tool I found is the ABX plugin for foobar2000 with replaygain enabled.



Ideally, one should use opus. It’s a open codec, and outperforms even the best AAC encoders at very low bitrates. CELT/SILK encoding mode switches on-the-fly with voice activity detection (VAD) and speech/music classification using a recurrent neural network (RNN), ensuring the best encoding method depending on content.

It also performs exceptionally well with surround sound. Say we want to target an equivalent quality of 128kbps stereo track (64kbps x 2 channels) on a 5.1 channel setup. Conventional codecs such as AAC simply use 64kbps x 6 channels = 374kbps (usually slightly less due to LFE & C channel being lows/vocals only).

OPUS on the other hand can achieve similar quality with a much lower bitrate, the recommended formula being (# stereo pairs) x (target stereo bitrate). In this case with a conventional 5.1 setup (L & R, C, LFE, BL & BR), we have (2 stereo pairs) x 128kbps = 256kbps. You may need to use the conventional formula if your stereo pairs aren’t “stereo” pairs, i.e. the audio is significantly different.

This is due to OPUS using surround masking and takes advantage of cross-channel masking techniques to smartly distribute bitrate. Think of HE-AACv2 but better and optimized for higher bitrate + surround setup. Instead of distributing 64kbps per channel, more bitrate goes to the stereo pairs and less to the center/LFE channels where only vocals/bass exist. Next, by utilizing joint encoding (intensity stereo) and other techniques it “increases” the “bitrate” per channel. Obviously this is an oversimplification and the underlying technology is way more complex, but you get the idea.

Sounds cool, right? Why not just always use this sci-fi level magical codec then? Well you see, we already do, but in the form of commercial implementation (FaceTime audio, Discord, Skype, etc.). Opus is not widely supported in many container formats (only in .mkv, .webm, .opus (.ogg), .caf (CoreAudio format)), especially in the video world, essentially limiting users to those using 3rd party players. This leaves us…


AAC. An old codec developed to kill mp3 and they (mp3) still exist for some reason. AAC is a widely supported codec just like mp3 was as its replacement. If a device can play music, 9/10 it supports AAC. Quality is about the same as Opus on higher bitrates. There’s also HE-AAC that’s used in low-bitrates (~64kbps) with spectral band replication (SBR) and HE-AAC v2 with Parametric Stereo (PS) that’s used in even lower bitrates (~48kbps).

I personally think that you shouldn’t use HE-AAC, especially He-AAC v2 as it’s pretty obvious with a decent studio monitor/headphone. If such low bitrate is needed, Opus is also much better.

Now that we’ve established AAC is the way to go for most users, now to the bad news: unlike Opus with 1 definitive official open source encoder, there are many encoders developed by many corporations for AAC, and some aren’t “free”.

Just like licensing issues plaguing HEVC, the same can be seen in the AAC world. This means without some computer knowledge, it’s pretty hard to your hands on good AAC encoders.

Now to introduce the 4 most prominent AAC encoders:

The first is qaac, an tool utilizing Apple’s CoreAudio toolkit to encode Apple AAC. Mac users: you have direct access to the CoreAudio library and do not need any special tools. Apple AAC is know to be the best consumer available AAC encoder.

Second in place is is Fraunhofer FDK AAC, developed by Fraunhofer IIS, and included as part of Android. While open-source and once used in FFmpeg, due to licensing issues it has pretty much disappeared from any builds. To get it back, you need to compile FFmpeg yourself and enable the non-free flag. Once compiled, the build cannot be shared or distributed.

(There’s also the FhG AAC encoder from Fraunhofer bundled in WinAmp but it’s another complicated topic for another day. Beginners to using AAC, just pretend it doesn’t exist.)

Third place is Nero AAC, once pretty prominent in the AAC world as it was provided by Nero themselves in their software. It is currently outdated and should not be used.

Last is the FFmpeg 3.0 encoder. AAC encoder in FFmpeg used to be trash, even at 128kbps it was hissing all over. The new improved encoder has “eh” quality and can be widely used with 1 caveat: it’s CBR ready only. The VBR is still experimental (although my tests show that it is not any worse than CBR, a.k.a. they’re both “eh”). I do not recommend any less than 256kbps (as you will see later).

Encoders Comparason

My MEGA page contains all the audio file samples (backup link). Feel free to download and listen/ABX them. I chose the Z*lda theme song orchestra due to its challenging nature: brass instruments and cymbals. If anything will show a flaw in low bitrates, its going to be the trumpet. Basically, if a certain encode setting can handle this track, it can handle everything.

Encoder Size (Bytes) Bitrate (kbps) Peak Bitrate (exclude FDK initial) FDK Burst
FLAC 38745489 899 1201
FDK VBR Q1 HEv2 1341425 31 40 41
FDK VBR Q2 HE 2973163 68 101 142
FDK VBR Q2 4051211 92 134 201
FDK VBR Q3 4681758 107 144 208
FDK VBR Q4 5812754 132 179 219
FDK VBR Q5 9925912 226 302 351
OPUS VBR 48kbps 2131368 49 81
OPUS VBR 64kbps 2841338 65 102
OPUS CVBR 64kbps 2792245 64 66
OPUS VBR 96kbps 4278459 97 131
OPUS VBR 128kbps 5698682 131 167
OPUS VBR 192kbps 8409104 194 234
qaac CVBR 40kbps 1794573 41 54
qaac CVBR 64kbps 2847934 65 86
qaac VBR Q27 3081110 70 91
qaac VBR Q64 5760371 132 154
qaac VBR Q91 8882577 205 227
qaac VBR Q109 11767124 272 296
qaac CVBR 256kbps (iTunes) 11852276 274 310
FFMPEG VBR Q0.5 2701485 61 75
FFMPEG VBR Q1.0 4527582 104 133
FFMPEG VBR Q1.5 6049106 139 185
FFMPEG VBR Q2.0 8727091 202 249



FDK AAC is probably the most accessible AAC encoder. It is built into FFmpeg and HandBrake (both require manual non-free compiling, go check out guides on the official site / videohelp / reddit how to compile HandBrake with FDK AAC. For FFmpeg use media-autobuild-suite. It being only ever so slightly behind qaac makes it the perfect encoder for ripping Blu-Rays without demux/remuxing.

Keep in mind anything other than Q5, all will have a low-pass filter, with it progressively lower the lower the bitrate is. I recommend using VBR Q5 due to its pretty high bitrate (~100kbps/channel), and to prevent the low-pass filter.

An interesting thing is that FDK tends to not respect VBR quality goal as well (i.e. VBR Q2 has a theoretical ~64kbps goal, but the result file is 92kbps), and allocate bits when it knows Q2 is simply too low for a specific file. This really isn’t an issue as on average FDK Q2 is indeed ~64kbps, just more of an FYI that FDK WILL throw more bits if a complex track needs it.

One “quirk” of FDK AAC is that it tends to allocate bitrate at the beginning for music (this is not a problem for anime as shown later) when it detects a complex beginning. This is why I do not recommend FDK AAC for music files as it tends to waste bits at the beginning adding up slowly. On the bright side over-allocation doesn’t impact quality (it “increases” actually haha).


Apple AAC (qaac)

While in a perfect world everyone should use the coreaudio encoder, the matter of fact is that it requires demuxing the file making FDK the best in terms of workflow in FFmpeg based programs. Some tools like Staxrip processes files as such, but for most this is very inconvenient. (Mac users: CoreAudio encoder can directly be used by programs like HandBrake).

Apple AAC is more suitable for music (as expected, since it’s the encoder used for iTunes). When encoding quiet tracks (such as movies) it often undershoots its bitrate target by quite a bit (which can be mitigatged by using a higher TVBR target or simply using CVBR). I really have nothing to say about it other than it’s really good, especially at low bitrates.

Apple AAC tends to respect bitrate targets much more than FDK AAC (with the exception of ultra low bitrates). It also favors allocating bitrate to music over speech.

Constrained VBR (CVBR) mode in Apple AAC constrains the minimum value to not go too low, but does not limit the upper value like that on Opus. Keep in mind CVBR encodes identical to TVBR if bitrate doesn’t drop below threashold. Funny enough, in the context of music tracks, CVBR behaves very similar to FDK AAC with an initial burst.



Not much really to say here. You really have to experience it yourself to understand what I mean by Opus really is the next gen codec. It pretty much has no major flaws, and multiple public tests have proven Opus to be pretty much the best encoder out there.

Its 2 self-switching encoding modes SILK/CELT also makes it perfect for both music and speech/vocal.

The only real down side to Opus (other than format support), is that it doesn’t have a quality mode and requires specifying a bitrate. For example, qaac Q91 has a target of ~192kbps. However, on complex tracks, it would not hesitate to go much higher just like the sample track provided. On tracks such as slice of life anime where much is just voice and quietness, the result bitrate will be much lower than 192. On the other hand, Opus pretty much always produces ~192kbps file. Additionally, qaac quality mode will base bitrate on per channel, whereas OPUS bases bitrate per audio track, meaning for OPUS you will have to manually set double/triple the bitrate for 5.1/7.1 surround sound audio encodes.

For streaming companies, Opus’s ability to respect bitrate so well is a huge advantage for networking and storage problems. For consumers like us however, this is a disadvantage as a quality mode would serve us much better. (Think of quality mode like crf in video encoders, huge file-to-file variation but ultimately equivalent quality for each file).

CVBR mode in Opus works different than those in AAC. Think of it as a VBR mode for CBR. It’s basically constant bitrate, but with a bit wiggle room for very small momentary bursts.

Opus also has a “soft” low-pass filter from 16-20kHz, and starts becoming progressively aggressive <96kbps. I say “soft” because it isn’t a hard-limit, but decided by the encoder depending on content. For example, in my test track even as low as 48kbps, when even HE-AAC is low-passing <13kHz OPUS still allows trumpet harmonics up to 20kHz. Opus also momentarily boosts VBR to ~50-55 kbps as the encoder smartly determined that low-passing the trumpet will be detrimental to the overall quality.

Opus vbr 192

FFmpeg 3.0

Holy Jesus it’s bad. Anything lower than 256kbps produces farting noises with the trumpet (both CBR and VBR). Usually it isn’t this bad, but now you know why I chose this track to compare encoders.

On the bright side their VBR algorithm is pretty much spot on. Though according to their website VBR is still experimental and should not be used.


Encoding Audio for Anime

FDK AAC vs. qaac


Using Danmachi episode 1 as an example, you can see how both are really more similar than different. The only real takeaway is that qaac’s quality mode differs drastically with different content and really favors music (Q91 is ~192kbps for music, but in this case it undershot the target by a 30kbps margin). After using a higher quality target for qaac to compensate, you can see both have very similar bitrate allocation and variability, with the exception of the ED where qaac increases the bitrate compared to FDK AAC.

Opus vs. qaac


As both AAC and Opus are fundamentally different, we can’t really draw any conclusions from this.

One thing to note is that Opus really does respect bitrate targets VERY well. Oh and again, qaac loves to allocate bits to music over vocals (OP and ED peak).

All numbers in kbps.

Encoder Stereo 5.1 7.1
OPUS 192 384 512
qaac (Apple AAC) -V 109 -V 100 -V 91
FDK AAC -m 5 -m 5 -m 5
FFmpeg 3.0 384 CBR DON’T BOTHER
OPUS Remember to enable the --vbr switch unless you need CBR. If your audio isn’t a conventional channel layout use the traditional (# channels) x 96kbps formula, else use (# stereo pairs) x 192kbps. Use 64kbps/channel or 128kbps / stereo pair if more compression is desired.
qaac (Apple AAC) Use capital -V for true VBR. -V 109/100/91 is approx 128/112/96kbps per channel. -V 82/73/64 (~80/~72/~64 kbps/channel) can be used if more compression is desired. qaac tends to aggressively save bitrate for non-music content, thus a higher VBR mode is recommended to compensate.
FDK AAC Always use VBR Quality 5 to avoid low-pass filter. Q5 is approx 100kbps per channel. You may use Q4 (~64kbps/channel) if you don’t mind the low-pass filter.
FFMPEG 3.0 I really do not recommend using this audio encoder unless it is stereo and a final compression render to upload to sites such as Youtube.

Here are my general rule of thumb for audio quality vs. gear:

64kbps/channel for budget chi-fi gear (<$100 USD).
96kbps/channel for mid-fi gear (~$300-500 USD audio setups).
SnakeOil/channel for hi-fi gear ($1000USD+ setups). Jokes aside double-blind ABX test your limits, though I doubt anyone can differentiate past 160kbps/channel.


Thanks for reading!

Basically yeah, follow this guide and your encodes should be good. This is the end of the guide, but if you want to read my ranting about audio gear and stuff keep reading.

Bonus: Ranting on Audio Gear and Recommendations

I really do wonder to what extent people are able to ABX and discern the difference between compressed files and lossless. I personally have trouble beyond >96kbps stereo for most sources and beyond >160kbps I literally cannot tell the difference. I only own an HD 650 and Etymotic ER2/3/4XR so I do wonder what people with better gear can hear.

My journey into the “audiophile” world

Skip this section if you aren’t into my life story.

In middle school, just like everyone out there I owned a pair of V-shaped generic $20 IEM. The forgotten days where I though muddy bass=good, the days when a +20db bass boost was about right.

High school is when I had the first taste of “real” audio gear. My first IEMs were the RHA MA750s. Classified as a “warm” IEM with a ~10db sub-bass boost and bright highs, it was one of the best at its time in the $100 price range. Initially I though it was really bass-light (lol), and the sharp 10K made me not like it as much.

In University, following the trend with everyone I got the infamous HD650s, and also ventured into the tube amp world (full of regrets, tube amp = money pit, coloured sound, and eh detail retrieval on most tubes). Here is when I got to experience what ‘soundstage’ is, got more used to a ‘natural’ sound signature.

While searching for an IEM upgrade after my RHA crapped out, I stumbled upon Etymotic’s ER3 series. I always knew that they were the benchmark for studio IEMs, but never really though much of it until one faithful day for some god damn reason I ordered the ER3XR to try it out.

To my pleasant surprise, after spending a month with it and getting used to a neutral sound signature, I am really impressed. Listening to classical on it is nothing compared to other IEMs in terms of timbre and accuracy, albeit the single BA does sound a tad ‘dry’ sometimes.

Sound “Quality”

My biggest complaint/gripe with the audio world is that many tend to associate “sound signature” with “sound quality”. So once again, I would like to scream at the world:


Fun fact, stronger bass actually worsens sound quality due to the stronger bass often creating distortion worsening harmonic distortion (THD) measurements.

(Below info relevant as of early 2021)

Part 1: IEMs

Beginners looking for recommendations for audio gear (IEMs): I highly recommend Etymotic’s ER2, ER3, and ER4 series, specifically the ER2XR for beginners (Diffuse-field flat with a +5dB bass boost). They definitely aren’t for everyone with their house-sound (Diffuse-Field Target w/ slightly weaker treble). However, for those who want a taste into what a truly neutral IEM sounds like, these are unbeatable.

For those not into Diffuse-Field tuning, I recommend finding headphones tuned to the Harman/modified-Harman 2017/2019 IEM target (more “mainstream”). I haven’t been keeping up with what’s best, but for the lower budget people chi-fi (Chinese-fi) is the way to go. Some recently popular brands in the chi-fi are Tin, KZ, FiiO, BLON, Moondrop, etc. Moondrop is especially hot on the radar these days, and other than it’s tuning, I can also attest to it’s sound quality from display units I tried.

Fun fact: Etymotic invented insert headphones.

There are 2 variants within the ER2/3/4 series, the SE/SR and XR (i.e. ER3XR). SE/SR is the studio version, and the XR is the bass-boosted (+3db, or +5dB for ER2XR) variant. If this is your first time venturing into the neutral sound signature, get the XR variant. Get the SE/SR if you want a truly FLAT bass. Note that since the highs aren’t boosted like most mainstream IEMs, the bass is surprisingly present due to other frequencies not drowning it out.

The ER2 are the cheapest (~$125 USD) of the bunch and use Dynamic drivers (DD). They have almost identical frequency response to the ER3, but due to the dynamic drivers they have better sub-bass (2dB stronger), sound more natural, and bass packs more of a punch (likely from the slower decay of DD). If you listen to mainstream music, these are the ones to pickup. The value these offer are quite amazing at $125. I personally prefer the SE (non-bass-boosted) version.

ER3 use a single Balanced Armature driver (BA) and are suited for people who listen to classical and the like, as it has better detail retrieval at the cost of bass sounding uh… a bit unnatural (difficult to describe, best way to put it is that it lacks impulse and dynamics). The ER3s are the best to get into the Etymotic house sound (~$130-160 USD). These are fine with non-fast bass (bass guitar, non-synthetic bass drums, etc.), but once a track gets too complex (i.e. in EDM or metal when every instrument plays + heavy bass hits) the single BA sometimes gets overwhelmed and bass starts bleeding into the mids. Those who are used to DDs might find BAs sound a bit ‘dry’. I personally recommend the SE version over the XR version for the ER3 due to it sounding more natural and less prone to bass bleeding into mids.

ER4 are the professional version of the ER lineup, and have a legendary history. They are manufactured in the States, with each unit having its FR and channel match certification. Get these if you want the best Etys can offer. They are a small upgrade from the ER3 so I do not recommend the ER4s if you already own the ER3s. For the ER4 I recommend the XR over the SR variant which is opposite of the ER3 recommendations (reasons are complicated, but the SR is perfectly fine if you don’t mind the flat bass).

The Etys use their infamous triple-flange eartips that may not be everyone. Fortunately due to their long nozzle design using foam tips do not alter the frequency response (Innerfidelity has proven this). I personally use Comply foam tips with them. There are also many aftermarket tips that fit them such as the Spinfits CP-800.

Totoally not payed by Ety to say this.

Part 2: Full-Sized Headphones

As a brief summary: closed back headphones have better noise isolation, open back leak noise but often have better soundstage.

I haven’t been keeping up in this market segment for years, so it’s up to you to research. Some big name brands in no particular order: Sennheiser, HiFiMan, Audeze, Beyerdynamic, Audio Technica, AKG, Grado, Fostex (LOTS of derivative ‘brands’ from modded T50 series), Focal, Philips, Sony, etc.

One thing to keep in mind is that full-sized headphones are often harder to drive (especially planers, don’t let the low impedance trick you) and require a dedicated amp.

Part 3: DAC/AMP

Note that new products come out every month so you should always do your research.

Also be very careful, other than cables, DAC/AMPs are the most prone to snake oil claims. Many might know the legend NwAvGuy who one day stormed into the scene, created an open-source Objective 2 AMP/DAC design that outperformed competition 10x its price then disappeared without trace. His story is a long one reserved for another day, but thanks to him modern gear are mostly more about objective measurements than subjective claims.

I usually search for an amp that is suitable for 1.5-2x my headphone impedance. This is due to headphones impedance are measured at 1KHz, whereas often the impedance isn’t linear throughout (stares at Senns). (Example: HD650 is rated at 300 ohms but its peak resistance is 500 ohms at 80Hz.) Update: RIP innerfidelity, hopefully someone has that link archived somewhere

Portable DAC/AMP rarely are powerful enough to drive 300 ohm class headphones. Amps that do are probably bad at driving low impedance IEMs (volume matching, output impedance problems, etc.). It’s basically pick your poison and finding you needs.

For IEMs: Your goal is to find an amp that is clean (high SINAD) and quiet (low noise-floor). Try find the lowest output impedance possible using the 1/8 (or 1/10) rule: the output impedance of your amp should be 1/8 or 1/10 (depending on who you ask) of your headphone impedance. Typically this means looking for <=1 ohm for IEMs. Portable amps often work well due to them being battery powered (clean DC power source). Digital DAC volume control is a plus to ensure channel matching.

For high-impedance headphones: Your goal is to find something that does well at high gain with low distortion. Output power calculators (Site 1 & 2) are your friend. You often find people complaining about weak/flabby/distorted bass on such headphones and a weak amp is probably the cause.


Crossed out some devices. I have not been keeping up with the scene now that I’ve found my end-game: the Apple Dongle. Yes, I’m dead serious.

  • Uber Budget: Apple USB-C to 3.5mm Dongle ($10)
    • No, this isn’t a joke. Make sure you get the US version, as the EU version is weaker and doesn’t measure as well due to reasons (EU volume limit shenanigans). These measure insanely well for $10 (99dB SINAD / 113dB SNR), and are perfect for even end-game IEMs due to it being really clean and non-existent noise-floor. Suitable for <50ohms IEMs / non-planar headphones. Can be used on Windows with no problems, although Android users may have volume issues due to a config bug (can be mitigated by using exclusive mode such as USB audio driver app).
  • Bluetooth: EarStudio ES100 ($100) Not sure what’s good these days.
  • Budget Desktop DAC/AMP Combo: FiiO K3 ($100)
    • Designed to replace the infamous E10K. While measurements aren’t the best in 2019, it packs a lot of functionality (Optical, RCA, etc.) and is pretty good for its price for an all-in-one. Also has an actually good 6db bass-boost switch for watching movies (6db bass boost on D-F tuning is close to Harman target’s bass’s 7db boost). Recommended for <150 ohm non-planar Headphones and IEMs.
  • Budget Desktop DAC/AMP Combo (More Power): FiiO K5 Pro ($150)
    • Has a surprisingly good AMP and OK DAC. Can even drive the HD650s well. Basically a much more powerful desktop K3 that can drive ~300ohm / planars with no problem.
  • Budget Portable DAC/AMP Combo: Topping NX4 ($160)
    • Pretty much better measurements than the FiiO K3. Can even drive the HD650s. However, QC isn’t as good as compared to FiiO.
  • Mid-Range Desktop DAC/AMP Combo: Topping DX3Pro ($220)
    • A really good all-in-one unit with good measurements. Not as good as separate DAC AMP units but for functionality (BT, preamp, etc.) and desk space friendliness it is unbeatable. If you get the newer v2 LDAC version, unfortunately its output impedance is 10 ohms so make sure your headphone is >80 ohms (1:8 rule).
  • Mid-Range DAC & AMP units: JDS Atom AMP/DAC ($100/$100), Schiit Heresy ($100), Grace SDAC ($79), Khadas Tone Board ($100), or similar products (e.g. SMSL/Topping DAC/AMPs)
    • These are popular entry-level single units. There are also many good Chinese DACs in the $100 price range, although it might be more of a hassle to acquire one (Aliexpress, warranty issues, etc.).
  • Upper-Range SE DAC/AMP: JDS Element II ($399)
    • The AMP unit is very good (the Atom was derived from this unit during research). The DAC chip is definitely the weakest link in this unit. However, if you need a nice looking DAC/AMP and don’t care for balanced connectors, this is still a very good choice. You’re definitely paying some premium for the looks though.
  • High-End Stuff: I will refrain from recommending anything specific, but here is a random list that might interest you. Always do your own research and never blindly trust strangers.
    • Massdrop THX AAA 789
    • ADI-2-DAC
    • SMSL SP200 THX
    • SMSL SU-8 v2
    • SDAC Balanced
    • Schiit Modius
    • Topping DX7Pro
    • Other brands: THX powered AMPs, Chord, iFi, etc.