安云网 - AnYun.ORG | 专注于网络信息收集、网络数据分享、网络安全研究、网络各种猎奇八卦。
当前位置: 安云网 > 技术关注 > 编程茶楼 > Python > Handling multipart/form-data natively in Python

Handling multipart/form-data natively in Python

时间:2019-10-21来源:未知 作者:安云网点击:
RFC7578 (who obsoletes RFC2388 ) defines the multipart/form-data type that is usually transported over HTTP when users submit forms on your Web page. Nowadays, it tends to be replaced by JSON encoded payloads; nevertheless, it is still widely used. Wh
//内容来自安云网

RFC7578 (who obsoletes RFC2388) defines the multipart/form-datatype that is usually transported over HTTP when users submit forms on your Web page. Nowadays, it tends to be replaced by JSON encoded payloads; nevertheless, it is still widely used.

//内容来自AnYun.ORG

While you could decode an HTTP body request made with JSON natively with Python — thanks to the json module — there is no such way to do that with multipart/form-data. That's something barely understandable considering how old the format is. //内容来自AnYun.ORG

There is a wide variety of way available to encode and decode this format. Libraries such as requests support this natively without making you notice, and the same goes for the majority of Web server frameworks such as Django or Flask. //本文来自安云网

However, in certain circumstances, you might be on your own to encode or decode this format, and it might not be an option to pull (significant) dependencies.

Encoding

The multipart/form-data format is quite simple to understand and can be summarised as an easy way to encode a list of keys and values, i.e., a portable way of serializing a dictionary.

There's nothing in Python to generate such an encoding. The format is quite simple and consists of the key and value surrounded by a random boundary delimiter. This delimiter must be passed as part of the Content-Type, so that the decoder can decode the form data.

There's a simple implementation in urllib3 that does the job. It's possible to summarize it in this simple implementation:

import binasciiimport osdef encode_multipart_formdata(fields):
    boundary = binascii.hexlify(os.urandom(16)).decode('ascii')

    body = (
        "".join("--%s\r\n"
                "Content-Disposition: form-data; name=\"%s\"\r\n"
                "\r\n"
                "%s\r\n" % (boundary, field, value)
                for field, value in fields.items()) +
        "--%s--\r\n" % boundary    )

    content_type = "multipart/form-data; boundary=%s" % boundary    return body, content_type


You can use by passing a dictionary where keys and values are bytes. For example:

encode_multipart_formdata({"foo": "bar", "name": "jd"})

Which returns:

--00252461d3ab8ff5c25834e0bffd6f70
Content-Disposition: form-data; name="foo"

bar
--00252461d3ab8ff5c25834e0bffd6f70
Content-Disposition: form-data; name="name"

jd
--00252461d3ab8ff5c25834e0bffd6f70--
multipart/form-data; boundary=00252461d3ab8ff5c25834e0bffd6f70


You can use the returned content type in your HTTP reply header Content-Type. Note that this format is used for forms: it can also be used by emails.

Emails did you say?

Encoding with email

Right, emails are usually encoded using MIME, which is defined by yet another RFC, RFC2046. It turns out that multipart/form-data is just a particular MIME format, and that if you have code that implements MIME handling, it's easy to use it to implement this format.

Fortunately for us, Python standard library comes with a module that handles exactly that: email.mime. I told you it was heavily used by email — I guess that's why they put that code in the email subpackage.

Here's a piece of code that handles multipart/form-data in a few lines of code:

from email import messagefrom email.mime import multipartfrom email.mime import nonmultipartfrom email.mime import textclass MIMEFormdata(nonmultipart.MIMENonMultipart):
    def __init__(self, keyname, *args, **kwargs):
        super(MIMEFormdata, self).__init__(*args, **kwargs)
        self.add_header(
            "Content-Disposition", "form-data; name=\"%s\"" % keyname)def encode_multipart_formdata(fields):
    m = multipart.MIMEMultipart("form-data")

    for field, value in fields.items():
        data = MIMEFormdata(field, "text", "plain")
        data.set_payload(value)
        m.attach(data)

    return m


Using this piece of code returns the following:

ontent-Type: multipart/form-data; boundary="===============1107021068307284864=="
MIME-Version: 1.0

--===============1107021068307284864==
Content-Type: text/plain
MIME-Version: 1.0
Content-Disposition: form-data; name="foo"

bar
--===============1107021068307284864==
Content-Type: text/plain
MIME-Version: 1.0
Content-Disposition: form-data; name="name"

jd
--===============1107021068307284864==--


This method has several advantages over our first implementation:

  • It handles Content-Type for each of the added MIME parts. We could add other data types than just text/plain like it is implicitly done in the first version. We could also specify the charset (encoding) of the textual data.

  • It's very likely more robust by leveraging the wildly tested Python standard library.

The main downside, in that case, is that the Content-Type header is included with the content. In case of handling HTTP, it is problematic as this needs to be sent as part of the HTTP header and not as part of the payload.

It should be possible to build a particular generator from email.generator that does this. I'll leave that as an exercise to you, reader.

Decoding

We must be able to use that same email package to decode our encoded data, right? It turns out that's the case, with a piece of code that looks like this:

import email.parser


msg = email.parser.BytesParser().parsebytes(my_multipart_data)print({
    part.get_param('name', header='content-disposition'): part.get_payload(decode=True)
    for part in msg.get_payload()})


With the example data above, this returns:

{'foo': b'bar', 'name': b'jd'}


Amazing, right?

The moral of this story is that you should never underestimate the power of the standard library. While it's easy to add a single line in your list of dependencies, it's not always required if you dig a bit into what Python provides for you!


本文标题: Handling multipart/form-data natively in Python 安云网
顶一下
(0)
0%
踩一下
(0)
0%
------分隔线----------------------------
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
验证码: 点击我更换图片
相关内容
推荐内容