❮ Pandas Series Pandas Cleaning ❯

Pandas JSON

JSON (JavaScript Object Notation, JavaScript Object Notation) is a syntax for storing and exchanging text information, similar to XML.

JSON is smaller, faster, and easier to parse than XML. For more JSON content, refer to the JSON Tutorial.

Pandas can conveniently handle JSON data. This article uses the sites.json as an example, with the following content:

Example

[
   {
   "id": "A001",
   "name": "tutorialpro.org",
   "url": "www.tutorialpro.org",
   "likes": 61
   },
   {
   "id": "A002",
   "name": "Google",
   "url": "www.google.com",
   "likes": 124
   },
   {
   "id": "A003",
   "name": "Taobao",
   "url": "www.taobao.com",
   "likes": 45
   }
]

Example

import pandas as pd

df = pd.read_json('sites.json')

print(df.to_string())

to_string() is used to return DataFrame data. We can also directly process JSON strings.

Example

import pandas as pd

data =[
    {
      "id": "A001",
      "name": "tutorialpro.org",
      "url": "www.tutorialpro.org",
      "likes": 61
    },
    {
      "id": "A002",
      "name": "Google",
      "url": "www.google.com",
      "likes": 124
    },
    {
      "id": "A003",
      "name": "Taobao",
      "url": "www.taobao.com",
      "likes": 45
    }
]
df = pd.DataFrame(data)

print(df)

The output of the above example is:

id    name             url  likes
0  A001    tutorialpro.org  www.tutorialpro.org     61
1  A002  Google  www.google.com    124
2  A003      淘宝  www.taobao.com     45

JSON objects have the same format as Python dictionaries, so we can directly convert Python dictionaries into DataFrame data:

Example

import pandas as pd

# JSON in dictionary format
s = {
    "col1":{"row1":1,"row2":2,"row3":3},
    "col2":{"row1":"x","row2":"y","row3":"z"}
}

# Convert JSON to DataFrame
df = pd.DataFrame(s)
print(df)

The output of the above example is:

col1 col2
row1     1    x
row2     2    y
row3     3    z

Reading JSON data from a URL:

Example

import pandas as pd

URL = 'https://static.tutorialpro.org/download/sites.json'
df = pd.read_json(URL)
print(df)

The output of the above example is:

id    name             url  likes
0  A001    tutorialpro.org  www.tutorialpro.org     61

1 A002 Google www.google.com 124 2 A003 Taobao www.taobao.com 45 0 A001 Tom 60 66 61 ABC primary school Year 1 1 A002 James 89 76 51 ABC primary school Year 1 2 A003 Jenny 79 90 78 ABC primary school Year 1

{
    "students": [
        {
            "id": "A001",
            "name": "Alice",
            "grade": {
                "math": 60,
                "physics": 80,
                "chemistry": 90
            }
        },
        {
            "id": "A002",
            "name": "Bob",
            "grade": {
                "math": 89,
                "physics": 76,
                "chemistry": 51
            }
        },
        {
            "id": "A003",
            "name": "Jenny",
            "grade": {
                "math": 79,
                "physics": 90,
                "chemistry": 78
            }
        }
    ]
}

Here we need to use the glom module to handle nested data. The glom module allows us to access properties of nested objects using ..

First, we need to install glom:

pip3 install glom

Example

import pandas as pd
from glom import glom

df = pd.read_json('nested_deep.json')

data = df['students'].apply(lambda row: glom(row, 'grade.math'))
print(data)

The above example outputs:

0    60
1    89
2    79
Name: students, dtype: int64

❮ Pandas Series Pandas Cleaning ❯