Pandas JSON
JSON (JavaScript Object Notation, JavaScript Object Notation) is a syntax for storing and exchanging text information, similar to XML.
JSON is smaller, faster, and easier to parse than XML. For more JSON content, refer to the JSON Tutorial.
Pandas can conveniently handle JSON data. This article uses the sites.json as an example, with the following content:
Example
[
{
"id": "A001",
"name": "tutorialpro.org",
"url": "www.tutorialpro.org",
"likes": 61
},
{
"id": "A002",
"name": "Google",
"url": "www.google.com",
"likes": 124
},
{
"id": "A003",
"name": "Taobao",
"url": "www.taobao.com",
"likes": 45
}
]
Example
import pandas as pd
df = pd.read_json('sites.json')
print(df.to_string())
to_string()
is used to return DataFrame data. We can also directly process JSON strings.
Example
import pandas as pd
data =[
{
"id": "A001",
"name": "tutorialpro.org",
"url": "www.tutorialpro.org",
"likes": 61
},
{
"id": "A002",
"name": "Google",
"url": "www.google.com",
"likes": 124
},
{
"id": "A003",
"name": "Taobao",
"url": "www.taobao.com",
"likes": 45
}
]
df = pd.DataFrame(data)
print(df)
The output of the above example is:
id name url likes
0 A001 tutorialpro.org www.tutorialpro.org 61
1 A002 Google www.google.com 124
2 A003 淘宝 www.taobao.com 45
JSON objects have the same format as Python dictionaries, so we can directly convert Python dictionaries into DataFrame data:
Example
import pandas as pd
# JSON in dictionary format
s = {
"col1":{"row1":1,"row2":2,"row3":3},
"col2":{"row1":"x","row2":"y","row3":"z"}
}
# Convert JSON to DataFrame
df = pd.DataFrame(s)
print(df)
The output of the above example is:
col1 col2
row1 1 x
row2 2 y
row3 3 z
Reading JSON data from a URL:
Example
import pandas as pd
URL = 'https://static.tutorialpro.org/download/sites.json'
df = pd.read_json(URL)
print(df)
The output of the above example is:
id name url likes
0 A001 tutorialpro.org www.tutorialpro.org 61
1 A002 Google www.google.com 124 2 A003 Taobao www.taobao.com 45 0 A001 Tom 60 66 61 ABC primary school Year 1 1 A002 James 89 76 51 ABC primary school Year 1 2 A003 Jenny 79 90 78 ABC primary school Year 1
{
"students": [
{
"id": "A001",
"name": "Alice",
"grade": {
"math": 60,
"physics": 80,
"chemistry": 90
}
},
{
"id": "A002",
"name": "Bob",
"grade": {
"math": 89,
"physics": 76,
"chemistry": 51
}
},
{
"id": "A003",
"name": "Jenny",
"grade": {
"math": 79,
"physics": 90,
"chemistry": 78
}
}
]
}
Here we need to use the glom module to handle nested data. The glom module allows us to access properties of nested objects using .
.
First, we need to install glom:
pip3 install glom
Example
import pandas as pd
from glom import glom
df = pd.read_json('nested_deep.json')
data = df['students'].apply(lambda row: glom(row, 'grade.math'))
print(data)
The above example outputs:
0 60
1 89
2 79
Name: students, dtype: int64