HomeAboutArchivesTools & ResourcesSupport me
Hi! 馃憢 I'm Szymon

I help you become a better software developer.

  • 1.15kFollowers
  • 2.36kSubscribers
  • 110Followers
  • 30.2kReputation

Latest posts from the jq cookbook

Merging JSON files recursively in the command-line
How to convert JSON to CSV from the command-line?
Parsing JSON in command-line with jq: basic filters and funct...
More

Popular categories

  • Groovy Cookbook28
  • Jenkins Pipeline Cookbook9
  • Programmer's Bookshelf6
  • Micronaut Cookbook4
  • Ratpack Cookbook4
  • Learning Java4
  • jq cookbook3
  • How to3
  • Blog Reports2

Don't miss anything - join my newsletter

Additional materials and updates for subscribers. No spam guaranteed.
Unsubscribe at any time.

Did you like this article?

Spread the !
  • 鈽曪笍
  1. e.printstacktrace.blog
  2. jq cookbook

Parsing JSON in command-line with jq: basic filters and functions (part 1)

  • May 24, 2020
  • 12 min. read
  • 0 Comments

Have you ever wondered, what is the most convenient way to parse JSON data in the Unix/Linux command line? For instance, how to parse some information from the curl JSON response? Grep? No, thank you. There is a better tool for that. And it’s called jq.

Welcome to the first episode of the jq cookbook series! The first episode is not about theory - we jump to the first exercise in just a second. I strongly encourage you to install jq command-line tool in your operating system and follow me. There are some exciting things to learn today, and we are starting right now!

Table of Contents
  • Introduction
  • Invoking jq
  • Processing input from command other than curl
  • Extracting fields from the array
  • Filtering null values with select(expr) filter
  • Sorting by field with sort_by(expr) filter
  • Limiting the size of the result with limit(n;expr) filter
  • Grouping by using group_by(expr) filter
  • Putting it all together

Introduction

In this blog post, I will use openlibrary.org public Search API. The reason for that is simple - it returns quite complex and interesting data we can process, and it still mimics a real-life example. I want to use this API to run a full-text search query. I’m going to use my last name as a search term - it is not a very popular Polish last name, so there shouldn’t be many results associated with it.

The JSON response used in this example is 3631 lines long. To keep all code samples clear, I will limit the JSON output to max. 10-15 lines. The original document can be found here - openlibrary.json.
Listing 1. An example of openlibrary.org full-text search query
$ curl -s "http://openlibrary.org/search.json?q=st%C4%99pniak"
{
 "start": 0,
 "num_found": 44,
 "numFound": 44,
 "docs": [
    ...
  ]
}
My last name contains Polish letter 臋, which gets encoded as %C4%99.

Invoking jq

Once we have the JSON document to process, it’s time to invoke jq processor. The simplest way to do it is to pipe the output from the curl command to the jq input.

Listing 2. Pipe from curl to jq example.
$ curl -s "http://openlibrary.org/search.json?q=st%C4%99pniak" | jq .
The . in the jq filter means identity.

This simple pipe doesn’t do much, and jq just prints the JSON "as is." However, there is one difference when compared with the plain curl response. jq by default prints JSON document using "pretty" format, and it adds some basic colors.

Figure 1. Plain curl output on the left side. The simplest jq . output on the right side.

Processing input from command other than curl

Using pipes comes with huge benefits - it doesn’t matter what command produces the input. It means that we can forward the curl command output to a file, and then use this file to produce input for the jq processor. This might be very useful when we do experiments, and we don’t want to download the same JSON document over and over again.

Listing 3. Forwarding curl output to the openlibrary.json file
$ curl -s "http://openlibrary.org/search.json?q=st%C4%99pniak" > openlibrary.json

Once we captured the input, we can e.g., pipe commands cat and jq to produce the same result, but this time without involving the call over the network.

Listing 4. Pipe from cat to jq example.
$ cat openlibrary.json | jq .

However, using cat is not necessary if we want to read the input from a file - jq takes a file as an argument as well. In the all upcoming examples, I will use this syntax to save a few characters.

Listing 5. Taking input from the file example.
$ jq . openlibrary.json

Extracting fields from the array

Let’s start processing the data with some real filters. The document we took from the openlibrary.org Search API has a list under the docs key. This list contains all documents matching the search term. Every document is described using multiple fields, but we are interested in the title, author_name, and publish_year only.

The first filter we want to use is called array/object value iterator .[]. This filter extracts values from the given array (or object), so the next filter in the pipeline gets applied to each element. In our case, we want to apply .docs[] filter to extract documents.

Listing 6. Applying array value iterator on the docs key.
$ jq '.docs[]' openlibrary.json
{
  "title_suggest": "S艂ownik tajemnych gwar przest臋pczych",
  "edition_key": [
    "OL1174034M"
  ],
  "isbn": [
    "9781859170069",
    "1859170064"
  ],
  ....

The next step is to extract only specific fields. jq filters support piping to apply new filters on top of the results from the previous ones. It uses the same | symbol, but pay attention to one significant difference - piping filters happens inside the jq argument string.

Listing 7. Piping jq filters to extract specific fields.
$ jq '.docs[] | {title,author_name,publish_year}' openlibrary.json
{
  "title": "S艂ownik tajemnych gwar przest臋pczych",
  "author_name": [
    "Klemens St臋pniak"
  ],
  "publish_year": [
    1993
  ]
}
{
  "title": "EU adjustment to eastern enlargement",
  "author_name": null,
  "publish_year": [
    1998
  ]
}
...

Looks good. Every document got extracted to the new one, containing only three fields. However, author_name and publish_year seem to be arrays. We can update the filter to extract the first value from those arrays and store them as scalar values instead.

Listing 8. Transforming arrays to scalar values example.
$ jq '.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]}' openlibrary.json
{
  "title": "S艂ownik tajemnych gwar przest臋pczych",
  "author_name": "Klemens St臋pniak",
  "publish_year": 1993
}
{
  "title": "EU adjustment to eastern enlargement",
  "author_name": null,
  "publish_year": 1998
}
...

I guess you already noticed that we can transform objects either by just referring to the field name (e.g. title) or by constructing the new field explicitly (e.g. author_name: .author_name[0].)

Filtering null values with select(expr) filter

When I looked at the response, I noticed that some documents were missing either author_name or publish_year. We can exclude such documents from the final result using the select(expr) filter.

Listing 9. Filtering objects containing null values.
$ jq '.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)' openlibrary.json
{
  "title": "S艂ownik tajemnych gwar przest臋pczych",
  "author_name": "Klemens St臋pniak",
  "publish_year": 1993
}
{
  "title": "Integracja regionalna i transfer kapita艂u",
  "author_name": "Andrzej St臋pniak",
  "publish_year": 1996
}
...

Sorting by field with sort_by(expr) filter

Now let’s say we want to get the result in a specific order. What if we want to sort the final result by publish_year? jq offer the sort_by(expr) filter, but before we apply it, we need to make one modification. The sort_by(expr) filter works on arrays, and our current output is not an array. If you look closely, you will see that this is just a set of JSON documents printed one after another. We can transform it into a JSON array by wrapping our current long filter with square brackets - an array constructor.

Compare the following example with the previous one and spot the difference.

Listing 10. Transforming set of JSON objects into a JSON array of objects using [] constructor.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)]' openlibrary.json
[
  {
    "title": "S艂ownik tajemnych gwar przest臋pczych",
    "author_name": "Klemens St臋pniak",
    "publish_year": 1993
  },
  {
    "title": "Integracja regionalna i transfer kapita艂u",
    "author_name": "Andrzej St臋pniak",
    "publish_year": 1996
  },
  ...
]

Once we have a JSON array constructed, we can pipe sort_by(.publish_year) into our filter.

Listing 11. Sorting a JSON array of objects by specific field using sort_by(expr).
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | sort_by(.publish_year)' openlibrary.json
[
  {
    "title": "Rada Portu i Dr贸g Wodnych w Wolnym Mie艣cie Gda艅sku.  Gda艅sku",
    "author_name": "Henryk St臋pniak",
    "publish_year": 1971
  },
  {
    "title": "Potr膮cenie w systemie polskiego prawa cywilnego",
    "author_name": "Lechos艂aw St臋pniak",
    "publish_year": 1975
  },
  {
    "title": "Uzdatnianie z艂omu poprodukcyjnego",
    "author_name": "Stanis艂aw St臋pniak",
    "publish_year": 1978
  },
  ...
]

By default, sort_by(expr) sorts objects in the ascending order. To sort objects in the descending order, we can pipe reverse filter after the sort_by(expr) one.

Listing 12. Sorting in the descending order.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | sort_by(.publish_year) | reverse' openlibrary.json
[
  {
    "title": "Dziedzictwo archiwalne we wsp贸艂pracy Polski i Ukrainy",
    "author_name": "W艂adys艂aw St臋pniak",
    "publish_year": 2010
  },
  {
    "title": "Misja S艂u偶by Wi臋ziennej a jej zadania wobec aktualnej polityki karnej i oczekiwa艅 spo艂ecznych",
    "author_name": "Polski Kongres Penitencjarny (4th 2008 Pozna艅, Poland)",
    "publish_year": 2008
  },
  {
    "title": "Czarny B贸r",
    "author_name": "W艂adys艂aw St臋pniak",
    "publish_year": 2007
  },
  ...
]

Limiting the size of the result with limit(n;expr) filter

As the next step, we may want to limit the size of the result. Let’s say we want to get the newest three publications as a result. We can use limit(n;expr) filter, but we need to be aware of one thing. To limit the size of an array using this filter, we need to use an expression that extracts values from an array for iteration - .[]. Once limit(n;expr) limits the number of objects, it produces the output in the same format. It means that we don’t get an array but rather a set of separate results instead.

Listing 13. Applying limit(3;.[]) filter.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | sort_by(.publish_year)| reverse | limit(3;.[])' openlibrary.json
{
  "title": "Dziedzictwo archiwalne we wsp贸艂pracy Polski i Ukrainy",
  "author_name": "W艂adys艂aw St臋pniak",
  "publish_year": 2010
}
{
  "title": "Misja S艂u偶by Wi臋ziennej a jej zadania wobec aktualnej polityki karnej i oczekiwa艅 spo艂ecznych",
  "author_name": "Polski Kongres Penitencjarny (4th 2008 Pozna艅, Poland)",
  "publish_year": 2008
}
{
  "title": "Czarny B贸r",
  "author_name": "W艂adys艂aw St臋pniak",
  "publish_year": 2007
}

If we want to represent the result as a JSON array, we need to wrap limit(n;expr) filter with [] to construct an array again.

Listing 14. Final JSON array of the three latest documents.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | sort_by(.publish_year)| reverse | [limit(3;.[])]' openlibrary.json
[
  {
    "title": "Dziedzictwo archiwalne we wsp贸艂pracy Polski i Ukrainy",
    "author_name": "W艂adys艂aw St臋pniak",
    "publish_year": 2010
  },
  {
    "title": "Misja S艂u偶by Wi臋ziennej a jej zadania wobec aktualnej polityki karnej i oczekiwa艅 spo艂ecznych",
    "author_name": "Polski Kongres Penitencjarny (4th 2008 Pozna艅, Poland)",
    "publish_year": 2008
  },
  {
    "title": "Czarny B贸r",
    "author_name": "W艂adys艂aw St臋pniak",
    "publish_year": 2007
  }
]

Grouping by using group_by(expr) filter

Let’s say we want to find out how many publications each author created. jq offers built-in group_by(expr) filter we can use. In our case, we will use group_by(.author_name) to group all publications by their authors.

Listing 15. Grouping results using group_by(.author_name) filter.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | group_by(.author_name)' openlibrary.json
[
  [
    {
      "title": "Integracja regionalna i transfer kapita艂u",
      "author_name": "Andrzej St臋pniak",
      "publish_year": 1996
    },
    {
      "title": "Promocja polskich podmiot贸w inwestycyjnych na obszarze Wsp贸lnot Europejskich",
      "author_name": "Andrzej St臋pniak",
      "publish_year": 1993
    },
    {
      "title": "Kwestia narodowa a spo艂eczna na 艢l膮sku Cieszy艅skim pod koniec XIX i w pocz膮tkach XX wieku, do 1920 roku",
      "author_name": "Andrzej St臋pniak",
      "publish_year": 1986
    },
    {
      "title": "Polska-WE",
      "author_name": "Andrzej St臋pniak",
      "publish_year": 1993
    }
  ],
  [
    {
      "title": "Kula jako symbol vanitas",
      "author_name": "Beata Purc-St臋pniak",
      "publish_year": 2004
    }
  ],
  ....
]

As you can see now, group_by(.author_name) produced an array of arrays. It’s hard to say how many publications each author did from this document, so we might need to transform this output a bit.

The next filter we need to apply is .[] to start iterating over the grouped arrays. We will pipe this iteration with a transformation that extracts author_name field from the first result using {author_name: .[0].author_name} filter. And as we seen before, .[] | {author_name: .[0].author_name} combination produces a set of results, so we need to wrap it with [] to construct an array.

Listing 16. Extracting author_name from the grouped result.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | group_by(.author_name)| [.[] | {author_name: .[0].author_name}]' openlibrary.json
[
  {
    "author_name": "Andrzej St臋pniak"
  },
  {
    "author_name": "Beata Purc-St臋pniak"
  },
  {
    "author_name": "Colloquia Jerzy Skowronek dedicata (10th 2004 Warsaw, Poland)"
  },
  ....
]

That’s a good step forward - now we have author names extracted from the grouped result. The next step is to update the transformation filter to also produce the count field - a field containing the number of publications. There is a simple way to do it. The current transformation filter, [.[] | {author_name: .[0].author_name}], iterates the nested arrays that contain all documents associated with the same author. It means that . (identity) in this filter refers to a list of objects, and that is why we extract the author name using .[0].author_name expression, which means: take the first object from the array and return its author_name value. We can use this array to count its size. In the jq, we have a filter called length that can be piped with an array to return its size. So in our case, an expression . | lenght used in the context of this transformation will return the size of an array, which represents the number of publications.

Listing 17. Extracting the number of publications using length filter.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | group_by(.author_name)| [.[] | {author_name: .[0].author_name, count: . | length}]' openlibrary.json
[
  {
    "author_name": "Andrzej St臋pniak",
    "count": 4
  },
  {
    "author_name": "Beata Purc-St臋pniak",
    "count": 1
  },
  {
    "author_name": "Colloquia Jerzy Skowronek dedicata (10th 2004 Warsaw, Poland)",
    "count": 1
  },
  ....
]

Voila!

Putting it all together

Let’s finish this experiment by putting all commands we have learned today together. We can add sort_by(expr) and limit(n;expr) to the latest query to produce the final result - an array of three authors that published the most documents.

Listing 18. Extracting the number of publications using length filter.
$ jq '[.docs[] | {title,author_name: .author_name[0], publish_year: .publish_year[0]} | select(.author_name!=null and .publish_year!=null)] | group_by(.author_name)| [.[] | {author_name: .[0].author_name, count: . | length}] | sort_by(.count) | reverse | [limit(3;.[])]' openlibrary.json
[
  {
    "author_name": "W艂adys艂aw St臋pniak",
    "count": 11
  },
  {
    "author_name": "Andrzej St臋pniak",
    "count": 4
  },
  {
    "author_name": "Henryk St臋pniak",
    "count": 3
  }
]

And that’s it for today. In the next episode, we will dive deeper into more advanced filters and functions that jq offers. Stay tuned!

Parsing JSON from curl with jq: sort_by, group_by, limit (part 1) | #jq #json #curl

In this video, I show you how to use the jq command-line to parse and transform a JSON response to the desired format. I use openlibrary.org Search API to get a complex JSON document (~3600 lines) using curl command for further parsing using jq. In the first step, we extract three fields for each document (title, author, publishing year) and then we continue transforming the JSON - we sort it, we limit the number of elements, and last but not least, we group elements by the author name. Watch now »

  • devops
  • rest
  • http
  • curl
  • jq
  • json
  • linux

Programmer's Bookshelf

Atomic Habits - book review

  • June 2, 2020
  • 12 min. read
  • 0 Comments

Sticking to the process of achieving my goals has always been one of the most difficult obstacles to overcome, both on my way to becoming a better professional, and in my private life. I have found that in some areas I need to make an extraordinary amount of effort to keep myself on the right track, no matter how S.M.A.R.T. (Specific, Measurable, Achievable, Relevant, Time-based) my goals are. I couldn鈥檛 understand why in some cases I was able to make the desired progress, while in others I kept failing. Was it a matter of talent I was missing, or a skill I haven鈥檛 acquired yet? I have seen the dots, but I didn鈥檛 yet know how to connect them.

Ratpack Cookbook

Using the same prefix with different HTTP methods in Ratpack

  • April 1, 2019
  • 12 min. read
  • 0 Comments

Ratpack is an excellent tool for building RESTful[1] applications. However, to benefit most of i...

jq cookbook

How to convert JSON to CSV from the command-line?

  • August 25, 2020
  • 12 min. read
  • 0 Comments

Are you looking for an easy and fast way to convert a JSON file to a CSV one? In this blog post,...

Jenkins Pipeline Cookbook

How to catch curl response in Jenkins Pipeline?

  • June 22, 2020
  • 12 min. read
  • 0 Comments

In this blog post, I explain why you may want to use curl command in your Jenkinsfile, how to ca...

Any thoughts or ideas?

Let's talk in the comment's section 馃挰

Want to put a code sample in the comment? Read the Syntax highlighting guide for more information.
Empty
Latest blog posts
  • Merging JSON files recursively in the command-line
  • Jenkins Declarative Pipeline with the dynamic agent - how to configure it?
  • Groovy Ecosystem Usage Report (2020)
  • How to convert JSON to CSV from the command-line?
  • 5 Common Jenkins Pipeline Mistakes
  • How to merge two maps in Groovy?
  • Building stackoverflow-cli with Java 11, Micronaut, Picocli, and GraalVM
  • How to catch curl response in Jenkins Pipeline?
  • Atomic Habits - book review
  • Parsing JSON in command-line with jq: basic filters and functions (part 1)
Trending videos
Jenkins Declarative Pipeline vs Scripted Pipeline - 4 key differences | #jenkinspipeline

Jenkins Pipeline as a code is a new standard for defining continuous integration and delivery pipelines in Jenkins. The scrip...

5 Common Jenkins Pipeline Mistakes 馃敟

In this Jenkins Pipeline tutorial video, I reveal five common Jenkins Pipeline mistakes and I explain how those mistakes can ...

Useful links
  • Start here
  • About
  • Archives
  • Resources
  • Privacy Policy
  • Merchandise
  • My Kit
  • RSS
  • Support the blog
Popular categories
  • Groovy Cookbook28
  • Jenkins Pipeline Cookbook9
  • Programmer's Bookshelf6
  • Micronaut Cookbook4
  • Ratpack Cookbook4
  • Learning Java4
  • jq cookbook3
  • How to3
  • Blog Reports2
Popular tags
affiliate async benchmark blogging book career cicd continuous-integration curl devops docker git github graalvm gradle grails groovy haskell hexo java java-8 jenkins jenkins-pipeline jenkinsfile jmh jq json junit learning maven metaprogramming micronaut native-image non-blocking progress ratpack reactive-programming reading recursion review rxjava sdkman session split spock stackoverflow string tail-call tail-recursion unit-test
  • Designed by @wololock
  • Created with Hexo
  • Hosted on GitHub
  • Deployed with Circle CI
  • License CC BY-NC-SA 4.0