410A912936C59CF3902D5C6
     ˆÁÅ„|.¸¿Ã ?÷     '''"Executable documentation" for the pickle module.

Extensive comments about the pickle protocols and pickle-machine opcodes
can be found here.  Some functions meant for external use:

genops(pickle)
   Generate all the opcodes in a pickle, as (opcode, arg, position) triples.

dis(pickle, out=None, memo=None, indentlevel=4)
   Print a symbolic disassembly of a pickle.
'''

import codecs
import io
import pickle
import re
import sys

__all__ = ['dis', 'genops', 'optimize']

bytes_types = pickle.bytes_types

# Other ideas:
#
# - A pickle verifier:  read a pickle and check it exhaustively for
#   well-formedness.  dis() does a lot of this already.
#
# - A protocol identifier:  examine a pickle and return its protocol number
#   (== the highest .proto attr value among all the opcodes in the pickle).
#   dis() already prints this info at the end.
#
# - A pickle optimizer:  for example, tuple-building code is sometimes more
#   elaborate than necessary, catering for the possibility that the tuple
#   is recursive.  Or lots of times a PUT is generated that's never accessed
#   by a later GET.


# "A pickle" is a program for a virtual pickle machine (PM, but more accurately
# called an unpickling machine).  It's a sequence of opcodes, interpreted by the
# PM, building an arbitrarily complex Python object.
#
# For the most part, the PM is very simple:  there are no looping, testing, or
# conditional instructions, no arithmetic and no function calls.  Opcodes are
# executed once each, from first to last, until a STOP opcode is reached.
#
# The PM has two data areas, "the stack" and "the memo".
#
# Many opcodes push Python objects onto the stack; e.g., INT pushes a Python
# integer object on the stack, whose value is gotten from a decimal string
# literal immediately following the INT opcode in the pickle bytestream.  Other
# opcodes take Python objects off the stack.  The result of unpickling is
# whatever object is left on the stack when the final STOP opcode is executed.
#
# The memo is simply an array of objects, or it can be implemented as a dict
# mapping little integers to objects.  The memo serves as the PM's "long term
# memory", and the little integers indexing the memo are akin to variable
# names.  Some opcodes pop a stack object into the memo at a given index,
# and others push a memo object at a given index onto the stack again.
#
# At heart, that's all the PM has.  Subtleties arise for these reasons:
#
# + Object identity.  Objects can be arbitrarily complex, and subobjects
#   may be shared (for example, the list [a, a] refers to the same object a
#   twice).  It can be vital that unpickling recreate an isomorphic object
#   graph, faithfully reproducing sharing.
#
# + Recursive objects.  For example, after "L = []; L.append(L)", L is a
#   list, and L[0] is the same list.  This is related to the object identity
#   point, and some sequences of pickle opcodes are subtle in order to
#   get the right result in all cases.
#
# + Things pickle doesn't know everything about.  Examples of things pickle
#   does know everything about are Python's builtin scalar and container
#   types, like ints and tuples.  They generally have opcodes dedicated to
#   them.  For things like module references and instances of user-defined
#   classes, pickle's knowledge is limited.  Historically, many enhancements
#   have been made to the pickle protocol in order to do a better (faster,
#   and/or more compact) job on those.
#
# + Backward compatibility and micro-optimization.  As explained below,
#   pickle opcodes never go away, not even when better ways to do a thing
#   get invented.  The repertoire of the PM just keeps growing over time.
#   For example, protocol 0 had two opcodes for building Python integers (INT
#   and LONG), protocol 1 added three more for more-efficient pickling of short
#   integers, and protocol 2 added two more for more-efficient pickling of
#   long integers (before protocol 2, the only ways to pickle a Python long
#   took time quadratic in the number o