Published

- 7 min read

Different ways to read json file in python

Choose the right library to read json file in Python based on the use case and performance.

img of Different ways to read json file in python

Introduction

Python has several ways to read JSON files and JSON data. The json module is the standard library for working with JSON data, but there are also other libraries and methods that can be useful depending on your use case. Here are some of the most common ways to read JSON files and JSON data in Python, depending on the use cases:

We have compared json, pandas, simplejson, ijson, ujson and orjson libraries.

  1. json: If you don’t want to install external libraries and have simple json data to parse.
  2. pandas: If you’re working with structured data like a JSON file containing an array of objects or a table-like structure. Commonly used in large-scale distributed data processing using Apache Spark.
  3. simplejson: It is a good choice for projects that need to support older versions of Python. If you are working with a codebase written in Python 2.5 or later, simplejson is a good choice.
  4. ijson: If you are looking for a streaming json data processor, ijson is a good choice. ijson is ideal for processing JSON data from network sources or large files on disk. ijson is useful if you have large json files to process that can’t be loaded in memory and are not working with large-scale distributed systems.
  5. ujson: If you are looking for a lightweight, fast json processing library, ujson is a good choice. However, since ujson has been put into a maintenance-only mode, it is recommended to use orjson.
  6. orjson: If you are looking for high-performance json processing, then orjson is an excellent choice. It is the fastest among all the libraries discussed in this blog post. It can also handle the processing of datetime and UUID fields if you have a use case for that.

Below is a detailed explanation of each library with code examples:

1. Using the built-in json module

The json module is a part of the standard Python library, which provides methods for parsing and converting JSON data.

a. Reading JSON from a File

You can use the json.load() method to read a JSON file and convert it into a Python dictionary or list. Here’s an example: Consider a file named data.json that contains an array of json objects. A sample json object is shown below:

   [
    {
        "employee_id": "d3009d51-47cc-4bad-9061-008cb2d549a0",
        "full_name": "Menard Pietrowski",
        "age": 59,
        "job_title": "Analyst Programmer",
        "salary": 82220.43,
        "hire_date": "10/18/2017",
        "department": "Finance",
        "location": "Suite 22",
        "email": "mpietrowski0@nationalgeographic.com",
        "phone_number": "520-592-9983"
    }
]
   import json

with open('data.json', 'r') as f:
    data = json.load(f)

print(type(data)) 
print(data) 

/CodeVxDev

ionicons-v5-d

Read_Json

python json_example.py

8:44:59

<class ‘list’> [{‘employee_id’: ‘d3009d51-47cc-4bad-9061-008cb2d549a0’, ‘full_name’: ‘Menard Pietrowski’, ‘age’: 59, ‘job_title’: ‘Analyst Programmer’, ‘salary’: 82220.43, ‘hire_date’: ‘10/18/2017’, ‘department’: ‘Finance’, ‘location’: ‘Suite 22’, ‘email’: ‘mpietrowski0@nationalgeographic.com’, ‘phone_number’: ‘520-592-9983’}]

b. Reading JSON from a String

If you have JSON data as a string instead of a file, you can use the json.loads() method to convert it into a Python dictionary or list. Here’s an example:

   import json

json_data = '[{"employee_id":"d3009d51-47cc-4bad-9061-008cb2d549a0","full_name":"Menard Pietrowski","age":59,"job_title":"Analyst Programmer","salary":82220.43,"hire_date":"10/18/2017","department":"Finance","location":"Suite 22","email":"mpietrowski0@nationalgeographic.com","phone_number":"520-592-9983"}]'
data = json.loads(json_data)

print(type(data)) 
print(data) 

2. Using the pandas library

pandas is a versatile and powerful library for data manipulation in Python, and it provides robust tools for handling JSON data. Pandas provides a rich set of functions for manipulating and transforming data, making it easy to clean, filter, and aggregate JSON data. Pandas integrates well with other Python libraries, such as numpy for numerical computations and matplotlib for data visualization. Here’s an example:

Consider the mock data described in the above section.

   import pandas as pd

# Read JSON data from a file
df = pd.read_json('data.json')
print(df)

# Group by age and count the number of employees
agg_df = df.groupby('age').size().reset_index(name='count')
print(agg_df)

/CodeVxDev

ionicons-v5-d

Read_Json

python pandas_example.py

8:44:59

employee_id … phone_number 0 d3009d51-47cc-4bad-9061-008cb2d549a0 … 520-592-9983 1 04c8454e-581a-4709-8c3c-b425b7be505a … 245-609-1012 2 7c37aeef-75b0-45fb-9b5c-e06c25880166 … 124-433-2437 3 403c596a-eb8b-460d-94e7-c31f55f2cc34 … 416-557-9790 4 f98c4c05-e4a4-47d5-ac1d-df18eebd54dc … 196-738-7135 .. … … … 995 16108901-832a-47ef-afbe-7421a19e5653 … 285-891-9528 996 09969e80-fa9e-4f9c-84bf-3858027799a4 … 779-557-9980 997 ad17a95a-d0ee-4a77-91fa-9e055f171c35 … 124-772-5242 998 cf3bbd49-c7d9-4f21-a018-0c40691b8384 … 770-272-8874 999 71d1e8af-3a91-4419-b8e4-31214acbeecd … 835-246-4218

[1000 rows x 10 columns] age count 0 20 29 1 21 25 2 22 27 3 23 22 4 24 20 … 35 55 28 36 56 20 37 57 27 38 58 32 39 59 19 40 60 19

3. Using the ijson library

ijson is a Python library designed for iterative parsing of JSON data. Unlike traditional JSON parsers that load the entire JSON document into memory, ijson allows you to process JSON data incrementally. This makes it particularly useful for handling large JSON files that may not fit into memory or for streaming JSON data from a network source. Here’s an example:

   import ijson

with open('data.json', 'r') as f:
    objects = ijson.items(f, 'item')

    for obj in objects:
        print(obj)

/CodeVxDev

ionicons-v5-d

Read_Json

python ijson_example.py

8:44:59

{‘employee_id’: ‘d3009d51-47cc-4bad-9061-008cb2d549a0’, ‘full_name’: ‘Menard Pietrowski’, ‘age’: 59, ‘job_title’: ‘Analyst Programmer’, ‘salary’: Decimal(‘82220.43’), ‘hire_date’: ‘10/18/2017’, ‘department’: ‘Finance’, ‘location’: ‘Suite 22’, ‘email’: ‘mpietrowski0@nationalgeographic.com’, ‘phone_number’: ‘520-592-9983’} {‘employee_id’: ‘04c8454e-581a-4709-8c3c-b425b7be505a’, ‘full_name’: ‘Ladonna McLoney’, ‘age’: 58, ‘job_title’: ‘Payment Adjustment Coordinator’, ‘salary’: Decimal(‘86467.42’), ‘hire_date’: ‘4/1/2013’, ‘department’: ‘Marketing’, ‘location’: ‘Apt 184’, ‘email’: ‘lmcloney1@prnewswire.com’, ‘phone_number’: ‘245-609-1012’} …

4. Using the simplejson Library

simplejson is a versatile and extensible JSON library for Python that offers additional features and compatibility with older versions of Python. It is a good choice for projects that require customizable JSON processing and need to support a broader range of Python versions. simplejson includes features like pretty-printing, handling of Decimal objects, and more, which are not available in the standard json module in older versions of Python.

   import simplejson as json

# Read JSON data from a file
with open('data.json', 'r') as file:
    data = json.load(file, use_decimal=True)

# Deserialize to Python data and pretty print
print(json.dumps(data, indent=4, use_decimal=True))

/CodeVxDev

ionicons-v5-d

Read_Json

python simplejson_example.py

8:44:59

   [
    {
        "employee_id": "d3009d51-47cc-4bad-9061-008cb2d549a0",
        "full_name": "Menard Pietrowski",
        "age": 59,
        "job_title": "Analyst Programmer",
        "salary": 82220.43,
        "hire_date": "10/18/2017",
        "department": "Finance",
        "location": "Suite 22",
        "email": "mpietrowski0@nationalgeographic.com",
        "phone_number": "520-592-9983"
    },
    {
        "employee_id": "04c8454e-581a-4709-8c3c-b425b7be505a",
        "full_name": "Ladonna McLoney",
        "age": 58,
        "job_title": "Payment Adjustment Coordinator",
        "salary": 86467.42,
        "hire_date": "4/1/2013",
        "department": "Marketing",
        "location": "Apt 184",
        "email": "lmcloney1@prnewswire.com",
        "phone_number": "245-609-1012"
    }
    ...
]

5. Using the ujson Library

The ujson library is a fast, ultra-lightweight JSON parser. It is a drop-in replacement for the standard json module. This library provides a high-performance JSON parser and generator for Python. It uses C extensions to achieve faster parsing and serialization speeds, making it an excellent choice for applications that require fast and efficient JSON processing. However, it may not support some advanced features offered by other libraries. The ujson library has been put into a maintenance-only mode. So it is recommended to use the orjson library, which is faster than ujson.

   import ujson

# Read JSON data from a file
with open('data.json', 'r') as file:
    data = ujson.load(file)

print(data)

The code output is the same as described in the above section.

6. Using the orjson Library

orjson is a powerful and efficient JSON library for Python that offers significant performance improvements over the standard json module. It is easy to use, correct and provides additional features for handling various data types. orjson is particularly useful for applications that require high-speed serialization and deserialization of JSON data. orjson offers additional options for serialization and deserialization, such as handling datetime objects, UUID objects, and more. orjson is one of the fastest options available for working with JSON data in Python.

   import orjson

# Read JSON data from a file
with open('data.json', 'r') as file:
    data = orjson.loads(file.read())

print(data)

The code output is the same as described in the above section.

Performance comparison of libraries

The below chart shows the performance of ujson, orjson, simplejson and json libraries based on calls/sec. The data is referenced from the ujson project description provided here.

Encode comparison chart: Metrics are in call/sec, larger is better.

img of Performance comparison of python libraries during encode

Decode comparison chart: Metrics are in call/sec, larger is better.

img of Performance comparison of python libraries during decode

In summary, the built-in json module should be sufficient for your needs if you’re working with small or simple JSON data sets. If you need more advanced features or better performance, consider using simplejson, ujson, or orjson. Of these three, ujson is a good default choice due to its wide popularity and strong performance, but if you’re dealing with very large or complex JSON data sets, orjson may be the fastest option available. However, it’s always a good idea to benchmark different libraries to determine which best suits your specific use case and requirements. All the code in this blog post can be found in this git repo.