Utilising Low Complexity CNNs to Lift Non-Local Redundancies in Video Coding
Jan P. Klopp, Liang-Gee Chen, Shao-Yi Chien
Graduate Institute of Electrical Engineering, National Taiwan University
Main Results
- Online (at encoding time) trained denoiser
- Design of a highly efficient CNN
- Parameters can be signalled to decoder
- Compared to H265, up to 6.8% coding gain Chroma, up to 14.4% on Luma
- Complexity 216 to 486 MAC/Pixel, a thousand times less than pretrained denoisers
Methods
- Train CNN on the fly to predict residuals of a group of pictures (GoP)
- Quantise the CNN's parameters
- Test the CNN's performance
- Compress parameters and add to bit stream if test is positive
The dashed arrows/boxes indicate data transfer/operations that are only
carried out in streaming scenarios where the decoder can access previous GoP's data.
- Test previously signalled CNN on the new GoP
- Fine-tune CNN on new GoP
- Quantise fine-tuned CNN's parameters
- Test fine-tuned and quantised CNN
- If gains are higher than from previous CNN, compress and add to bit stream
Material
Paper (IEEE Xplore)
Paper (preprint)
Additional Results
We supply a few preliminary results for applying our method to two codecs used in practice: x265 and AV1. For a full
comparision, please refer to the results in the paper (link above), as our method has not been adjusted for x265
or AV1. Comparison is provided in the form of BDRate coding gain over the respective baseline codec, YUV PSNR
values were averaged with weights 6, 1 and 1 as is common practice in video coding.
Comparison w/ x265 as baseline
| |
GoP: 32 Frames |
GoP: 128 Frames |
Dataset | Resolution |
Complexity (MAC/pixel) | Coding Gain |
Complexity (MAC/pixel) | Coding Gain |
HM A |
2560×1600 |
148.5 |
-6.5% |
148.5 |
-6.9% |
HM B | 1920×1080 |
486 |
-5.5% |
486 |
-5.7% |
HM C | 832×480 |
486 |
-5.1% |
486 |
-5.4% |
VTM A | 3840×2160 |
148.5 |
-4.9% |
148.5 |
-5.0% |
VTM B | 1920×1080 |
148.5 |
-4.5% |
486 |
-5.4% |
Comparison w/ AV1 as baseline
| |
GoP: 32 Frames |
GoP: 128 Frames |
Dataset | Resolution |
Complexity (MAC/pixel) | Coding Gain |
Complexity (MAC/pixel) | Coding Gain |
HM A |
2560×1600 |
148.5 |
-4.8% |
148.5 |
-4.7% |
HM B | 1920×1080 |
486 |
-4.3% |
486 |
-4.2% |
HM C | 832×480 |
486 |
-3.8% |
486 |
-4.2% |
VTM A | 3840×2160 |
148.5 |
-3.0% |
148.5 |
-3.0% |
VTM B | 1920×1080 |
486 |
-4.0% |
486 |
-3.9% |
Sequences: HM A: NebutaFestival, PeopleOnStreet, SteamLocomotiveTrain, Traffic;
HM B: BasketballDrive, BQTerrace, Cactus, Kimono1, ParkScene; HM C: BasketballDrill, BQMall, PartyScene, RaceHorses;
VTM A: Campfire, CatRobot1, DaylightRoad2, FoodMarket4, ParkRunning3, Tango2; VTM B: BasketballDrive, BQTerrace,
Cactus, MarketPlace, RitualDance
Cite
If you find our work helpful for your research, please consider citing it:
@ARTICLE{9088301,
author={J. P. {Klopp} and L. {Chen} and S. {Chien}},
journal={IEEE Transactions on Image Processing},
title={Utilising Low Complexity CNNs to Lift Non-Local Redundancies in Video Coding},
year={2020},
volume={},
number={},
pages={1-1},
doi={10.1109/TIP.2020.2991525},
ISSN={1941-0042},
month={},}