CVE-2019-13288 in XPDF 3.02 (infinite recursion)
Fuzzer:AFL++

前期准备

测试用例

下载 Xpdf 3.02

wget https://dl.xpdfreader.com/old/xpdf-3.02.tar.gz
tar -xvzf xpdf-3.02.tar.gz

编译 Xpdf(这里是为了测试,后面会 make clean 然后插桩后重新编译)

cd xpdf-3.02
sudo apt update && sudo apt install -y build-essential gcc
./configure --prefix="$HOME/fuzzing_xpdf/install/"
make
make install

configure 有很多参数需要测试,主要的如下:

  • AS:汇编程序名称
  • CC:C编译器名称
  • CXX:C++编译器名称
  • CPP:C预编译器名称
  • FLAGS:为不同编译器名称,表示对应编译器的参数
  • LD:链接器名称
  • AR:归档器archiver名称
  • RANLIB:符号表添加器名称

Fuzzer 在 fuzzing 开始时需要一个样本并基于样本逐步变异,我们随便找些样本就行

cd $HOME/fuzzing_xpdf
mkdir pdf_examples && cd pdf_examples
wget https://github.com/mozilla/pdf.js-sample-files/raw/master/helloworld.pdf
wget http://www.africau.edu/images/default/sample.pdf
wget https://www.melbpc.org.au/wp-content/uploads/2017/10/small-example-pdf
file.pdf

安装 AFL++

首先是一些依赖,比如 Clang-11 和 LLVM-11

sudo apt-get update
sudo apt-get install -y build-essential python3-dev automake git flex bison libglib2.0-dev libpixman-1-dev python3-setuptools
sudo apt-get install -y lld-11 llvm-11 llvm-11-dev clang-11 || sudo apt-get install -y lld llvm llvm-dev clang
sudo apt-get install -y gcc-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-plugin-dev libstdc++-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-dev

然后是编译安装 AFL++

cd $HOME
git clone https://github.com/AFLplusplus/AFLplusplus && cd AFLplusplus
export LLVM_CONFIG="llvm-config-11"
make distrib
sudo make install

这一步涉及到 QEMU-Nyx 的一些步骤可能会比较久,如果卡住太久其实可以 Ctrl+C 然后重新 make 编译完后输入afl-fuzz弹出 AFL++ 的参数手册就算安装完成了

使用 AFL++

使用 AFL 内置编译器插桩并编译

AFL 为了更好地变异输入,在我们持有目标源代码的时候可以对源代码进行插桩以实时监控每个变异输入的成效 所以我们现在清理掉之前编译的测试目标,用 fuzzer 的编译器插桩编译

首先是清理之前的编译

rm -r $HOME/fuzzing_xpdf/install
cd $HOME/fuzzing_xpdf/xpdf-3.02/
make clean

然后我们用 afl-clang-fast 插桩编译,参数不懂可以翻到上面的记录

export LLVM_CONFIG="llvm-config-11"
CC=$HOME/AFLplusplus/afl-clang-fast CXX=$HOME/AFLplusplus/afl-clang-fast++
./configure --prefix="$HOME/fuzzing_xpdf/install/"
make
make install

然后我们就可以开始 fuzzing 了

afl-fuzz -i $HOME/fuzzing_xpdf/pdf_examples/ -o $HOME/fuzzing_xpdf/out/ -s 123 -- $HOME/fuzzing_xpdf/install/bin/pdftotext @@ $HOME/fuzzing_xpdf/output

还是一些参数的介绍

  • -i:输入样本路径
  • -o:输出存储路径
  • -s:fuzzing时随机数使用的种子,这里为了尽量保证复现结果,设为123
  • --:目标程序
  • @@:占位符
    • 加@@:被fuzz的程序从文件读取输入
    • 不加@@:被fuzz的程序从标准输入输出流读取输入

然后就是跑 fuzzing 慢慢等待结果了,效果如下

(其实只用跑出一个 crash 就可以停了,不要像我一样傻傻地等了整整一个钟)

生成的种子文件和 crash 文件可以在刚刚参数里设定的位置里找到

漏洞分析

动态调试

gdb --args $HOME/fuzzing_xpdf/install/bin/pdftotext $HOME/fuzzing_xpdf/out/default/crashes/<crashes_filename> $HOME/fuzzing_xpdf/output

跑出来大概是这么个效果,一堆 Error(可能是因为变异出来的文件充满大量非法输入)最后报出Program received signal SIGSEGV, Segmentation fault,位置在 malloc.c,也就是堆内存漏洞

bt 一下看看 crash 路径(注意要及时手动终止,否则会刷屏)

位置在 xpdf/Parser.cc 的第 94 行,后面似乎是在无限递归不停调用Parser::getObj()

// stream objects are not allowed inside content streams or
// object streams
if (allowStreams && buf2.isCmd("stream")) {
if ((str = makeStream(obj, fileKey, encAlgorithm, keyLength, objNum, objGen))) {
obj->initStream(str);
}
   else {
obj->free();
obj->initError();
}
}
else {
shift();
}

94 行是makeStream(),打个断点 VS Code 调试一下

先来到main(),大致意思是准备一个 output device,然后调用displayPages()

// write text file
textOut = new TextOutputDev(textFileName->getCString(), physLayout, rawOrder, htmlMeta);
if (textOut->isOk()) {
doc->displayPages(textOut, firstPage, lastPage, 72, 72, 0, gFalse, gTrue, gFalse);
}
else {
delete textOut;
exitCode = 2;
goto err3;
}
delete textOut;

跟进displayPages(),其对文件的每一页都调用displayPage(),而displayPage()获取页码并调用display()

void PDFDoc::displayPage(OutputDev *out, int page,
double hDPI, double vDPI, int rotate,
GBool useMediaBox, GBool crop, GBool printing,
GBool (*abortCheckCbk)(void *data),
void *abortCheckCbkData) {
if (globalParams->getPrintCommands()) {
printf("***** page %d *****\n", page);
}
catalog->getPage(page)->display(out, hDPI, vDPI,
rotate, useMediaBox, crop, printing, catalog,
abortCheckCbk, abortCheckCbkData);
}
void PDFDoc::displayPages(OutputDev *out, int firstPage, int lastPage,
double hDPI, double vDPI, int rotate,
GBool useMediaBox, GBool crop, GBool printing,
GBool (*abortCheckCbk)(void *data),
void *abortCheckCbkData) {
int page;
for (page = firstPage; page <= lastPage; ++page) {
displayPage(out, page, hDPI, vDPI, rotate, useMediaBox, crop, printing,
abortCheckCbk, abortCheckCbkData);
}
}

跟进display(),其直接调用displaySlice()

void Page::display(OutputDev *out, double hDPI, double vDPI,
  int rotate, GBool useMediaBox, GBool crop,
  GBool printing, Catalog *catalog,
  GBool (*abortCheckCbk)(void *data),
  void *abortCheckCbkData) {
displaySlice(out, hDPI, vDPI, rotate, useMediaBox, crop,
      -1, -1, -1, -1, printing, catalog,
      abortCheckCbk, abortCheckCbkData);
}
void Page::displaySlice(OutputDev *out, double hDPI, double vDPI,
int rotate, GBool useMediaBox, GBool crop,
int sliceX, int sliceY, int sliceW, int sliceH,
GBool printing, Catalog *catalog,
GBool (*abortCheckCbk)(void *data),
void *abortCheckCbkData) {
#ifndef PDF_PARSER_ONLY
PDFRectangle *mediaBox, *cropBox;
PDFRectangle box;
Gfx *gfx;
Object obj;
Annots *annotList;
Dict *acroForm;
int i;
if (!out->checkPageSlice(this, hDPI, vDPI, rotate, useMediaBox, crop,
sliceX, sliceY, sliceW, sliceH,
printing, catalog,
abortCheckCbk, abortCheckCbkData)) {
return;
}
rotate += getRotate();
if (rotate >= 360) {
  rotate -= 360;
}
   else if (rotate < 0) {
  rotate += 360;
}
makeBox(hDPI, vDPI, rotate, useMediaBox, out->upsideDown(),
        sliceX, sliceY, sliceW, sliceH, &box, &crop);
cropBox = getCropBox();
if (globalParams->getPrintCommands()) {
  mediaBox = getMediaBox();
   printf("***** MediaBox = ll:%g,%g ur:%g,%g\n",
  mediaBox->x1, mediaBox->y1, mediaBox->x2, mediaBox->y2);
   printf("***** CropBox = ll:%g,%g ur:%g,%g\n",
  cropBox->x1, cropBox->y1, cropBox->x2, cropBox->y2);
   printf("***** Rotate = %d\n", attrs->getRotate());
}
gfx = new Gfx(xref, out, num, attrs->getResourceDict(),
hDPI, vDPI, &box, crop ? cropBox : (PDFRectangle *)NULL,
rotate, abortCheckCbk, abortCheckCbkData);
contents.fetch(xref, &obj);
// ...

通过调试发现无限递归的入口在contents.fetch(xref, &obj);,xref 是 Page 类的成员,根据注释得知其是 PDF 文件的 xref table(交叉引用表),长成这样

而我们输入的 crash file 的 xref table 是这样的,显然已经损坏

关于 PDF 文件结构的内容可以参看PDF文档结构详解

继续跟进contents.fetch(),程序在这里流向xref->fetch()

Object *Object::fetch(XRef *xref, Object *obj) {
return (type == objRef && xref) ?
        xref->fetch(ref.num, ref.gen, obj) : copy(obj);
}

在这里调用的实际参数是xref->fetch(7, 0, obj);,跟进Xref::fetch()

Object *XRef::fetch(int num, int gen, Object *obj) {
XRefEntry *e;
Parser *parser;
Object obj1, obj2, obj3;
// check for bogus ref - this can happen in corrupted PDF files
if (num < 0 || num >= size) {
  goto err;
}
e = &entries[num];
switch (e->type) {
case xrefEntryUncompressed:
  if (e->gen != gen) {
    goto err;
  }
  obj1.initNull();
  parser = new Parser(this,
      new Lexer(this,
str->makeSubStream(start + e->offset, gFalse, 0, &obj1)),
      gTrue);
  parser->getObj(&obj1);
  parser->getObj(&obj2);
  parser->getObj(&obj3);
  if (!obj1.isInt() || obj1.getInt() != num ||
!obj2.isInt() || obj2.getInt() != gen ||
!obj3.isCmd("obj")) {
    obj1.free();
    obj2.free();
    obj3.free();
    delete parser;
    goto err;
  }
  parser->getObj(obj, encrypted ? fileKey : (Guchar *)NULL,
  encAlgorithm, keyLength, num, gen);
  obj1.free();
  obj2.free();
  obj3.free();
  delete parser;
  break;
// ...

程序在parser->getObj(obj, encrypted ? fileKey : (Guchar *)NULL, encAlgorithm, keyLength, num, gen);进入下一层,实际参数是 getObj(obj, NULL, cryptRC4, 0, 7, 0);,继续跟进

Object *Parser::getObj(Object *obj, Guchar *fileKey,
      CryptAlgorithm encAlgorithm, int keyLength,
      int objNum, int objGen) {
// ...
// array
if (buf1.isCmd("[")) {
  // ...
// dictionary or stream
}
   else if (buf1.isCmd("<<")) {
  shift();
  obj->initDict(xref);
  while (!buf1.isCmd(">>") && !buf1.isEOF()) {
    if (!buf1.isName()) {
error(getPos(), "Dictionary key must be a name object");
shift();
    }
           else {
key = copyString(buf1.getName());
shift();
if (buf1.isEOF() || buf1.isError()) {
gfree(key);
break;
}
obj->dictAdd(key, getObj(&obj2, fileKey, encAlgorithm, keyLength,
objNum, objGen));
    }
  }
   if (buf1.isEOF())
    error(getPos(), "End of file inside dictionary");
   
   // stream objects are not allowed inside content streams or
   // object streams
   if (allowStreams && buf2.isCmd("stream")) {
    if ((str = makeStream(obj, fileKey, encAlgorithm, keyLength,
  objNum, objGen))) {
   // ...

程序在makeStream(obj, fileKey, encAlgorithm, keyLength, objNum, objGen)进入下一层,其调用dict->dictLookup("Length", &obj);来获取键值对,然后判定 value 是否是 int 类型

Stream *Parser::makeStream(Object *dict, Guchar *fileKey,
  CryptAlgorithm encAlgorithm, int keyLength,
  int objNum, int objGen) {
Object obj;
BaseStream *baseStr;
Stream *str;
Guint pos, endPos, length;
// get stream start position
lexer->skipToNextLine();
pos = lexer->getPos();
// get length
dict->dictLookup("Length", &obj);
   if (obj.isInt()) {
  length = (Guint)obj.getInt();
  obj.free();
}
   else {
  error(getPos(), "Bad 'Length' attribute in stream");
  obj.free();
  return NULL;
}
   // ...

跟进dict->dictLookup("Length", &obj);

inline Object *Object::dictLookup(char *key, Object *obj)
{ return dict->lookup(key, obj); }

是个 inline 函数封装,继续跟进

Object *Dict::lookup(char *key, Object *obj) {
DictEntry *e;
return (e = find(key)) ? e->val.fetch(xref, obj) : obj->initNull();
}

随后调用Object::fetch(),由于 xref table 存在且通过单步调试得知 val 是 objRef 类型,于是最后又见到了熟悉的Xref::fetch(),跟进后发现实际参数也是xref->fetch(0, 7, obj);,完成闭环

Object *Object::fetch(XRef *xref, Object *obj) {
return (type == objRef && xref) ?
        xref->fetch(ref.num, ref.gen, obj) : copy(obj);
}

最后总结一下整个链条

我们的输入文件是这样的

  1. main调用displaySlice`
  2. displaySlice调用contents.fetch(xref, &obj)contents 是一个 objRef,共用体 ref 二元组为 (num=7, gen=0)
  3. 检测到 xref table 存在,于是调用xref->fetch(ref.num=7, ref.gen=0, obj)
  4. xref->fetch过程中,检测到未被压缩,调用parser->getObj(obj, fileKey=NULL, encAlgorithm=<RC4>, keyLength=0, num=7, gen=0)
  5. Parser::getObj()调用makeStream(obj, fileKey=NULL, encAlgorithm=<RC4>, keyLength=0, objNum=7, objGen=0)
  6. Parser::makeStream()调用obj->dictLookup("Length", &newobj)也就是obj->dict->lookup("Length", &newobj)的封装来查找 PDF Object 的参照流字典里的 "Length" 键值对,然后把 value 传给 newobj
  7. 成功找到 "Length" 键值对,调用val.fetch(xref, &newobj)
  8. 此时 val 是 objRef 类型,且 xref 存在,由于valref二元组也为(num=7, gen=0),所以又调用xref->fetch(7, 0, &newobj)回到步骤 3 进入无限递归

所以漏洞的触发原因是: "Length" 键值对的 value 是 objRef 类型而非作者所期望的 objInt 类型(如果是 objInt 就会调用copy()返回一个副本),并且作者考虑到 lookup 的使用非常频繁于是将类型检测写在了无限递归入口dict->dictLookup()之后以保证性能,最终导致漏洞被触发

修复

找到新版本的 Parser.cc,修复方案相当简单粗暴,限制一个递归深度就完事了

// Max number of nested objects. This is used to catch infinite loops
// in the object structure.
#define recursionLimit 500
// ...
Object *Parser::getObj(Object *obj, GBool simpleOnly,
      Guchar *fileKey,
      CryptAlgorithm encAlgorithm, int keyLength,
      int objNum, int objGen, int recursion) {    
// ...
   else if (!simpleOnly && recursion < recursionLimit && buf1.isCmd("<<")) {  
   
       // ...      
  // stream objects are not allowed inside content streams or
  // object streams
  if (allowStreams && buf2.isCmd("stream")) {
    if ((str = makeStream(obj, fileKey, encAlgorithm, keyLength,
  objNum, objGen, recursion + 1))) {
        // ...

也可以参看别人的修复方案,通过特制一个查询专用函数Dict::lookupLength(),直接调用copy()返回副本而不进行内部的类型检查来防止无限递归

Object *Dict::lookup(char *key, Object *obj) {
DictEntry *e;
return (e = find(key)) ? e->val.fetch(xref, obj) : obj->initNull();
}​
Object *Dict::lookupLength(char *key, Object *obj) {
DictEntry *e;​
return (e = find(key)) ? e->val.copy(obj) : obj->initNull();
}