Results 1 to 7 of 7
  1. #1
    Join Date
    Jan 2017
    Posts
    12

    Answered: selecting specific JSON fields

    Postgresql 9.6, Need help selecting specific data elements from JSON data.

    CREATE TABLE IF NOT EXISTS foo.bar (
    sample_dttm TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    data JSONb NOT NULL,
    PRIMARY KEY (sample_dttm)
    );

    Many rows of JSON data, similar to as follows:
    {
    "network": {
    "version": "1.0.0",
    "data": {
    "9": {
    "id": 9,
    "state": 0,
    "info": {
    "version": 10753,
    "errors": ""
    }
    },
    "41": {
    "id": 41,
    "state": 1,
    "info": {
    "version": 43490,
    "errors": "Message 1"
    }
    },
    "34": {
    "id": 34,
    "state": 0,
    "info": {
    "version": 1842,
    "errors": ""
    }
    },
    "20": {
    "id": 20,
    "state": 0,
    "info": {
    "version": 4983,
    "errors": ""
    }
    }
    }
    }
    }


    I want to select specific fields from the JSON data. For example:

    SELECT sample_dttm,network->data->*->id, network->data->*->info->errors FROM foo.bar WHERE network->data->*->state != 0;
    Should produce:
    "yyyy-mm-dd HH:MMS",41, "Message 1"
    PS, not worried about date format here, just an example...
    Last edited by usao; 01-15-17 at 22:23.

  2. Best Answer
    Posted by shammat

    "You need to "unnest" the embedded objects, then you can select on them:

    Code:
    select bar.sample_dttm, t.doc, t.doc -> 'id' as id, t.doc #>> array['info','errors'] as message, t.doc -> 'state'
    from bar
       cross join lateral jsonb_each(bar.data #> array['network','data']) as t(id,doc)
    where t.doc -> 'state' <> '0';
    The unnesting is done through jsonb_each(bar.data #> array['network','data']) which returns one row for each object inside the "network" key. If you run
    Code:
    select t.*
    from bar
       cross join lateral jsonb_each(bar.data #> array['network','data']) as t(id,doc)
    you will see what this does.

    Online example: http://rextester.com/KAKUI72758

    For details on the JSON functions and operators, see the manual: https://www.postgresql.org/docs/curr...ions-json.html"


  3. #2
    Join Date
    Nov 2003
    Posts
    2,988
    Provided Answers: 23
    You need to "unnest" the embedded objects, then you can select on them:

    Code:
    select bar.sample_dttm, t.doc, t.doc -> 'id' as id, t.doc #>> array['info','errors'] as message, t.doc -> 'state'
    from bar
       cross join lateral jsonb_each(bar.data #> array['network','data']) as t(id,doc)
    where t.doc -> 'state' <> '0';
    The unnesting is done through jsonb_each(bar.data #> array['network','data']) which returns one row for each object inside the "network" key. If you run
    Code:
    select t.*
    from bar
       cross join lateral jsonb_each(bar.data #> array['network','data']) as t(id,doc)
    you will see what this does.

    Online example: http://rextester.com/KAKUI72758

    For details on the JSON functions and operators, see the manual: https://www.postgresql.org/docs/curr...ions-json.html
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags: http://www.dbforums.com/misc.php?do=bbcode#code

    Tips for good questions:

    http://tkyte.blogspot.de/2005/06/how...questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html

  4. #3
    Join Date
    Jan 2017
    Posts
    12

    Thanks

    This works. Thanks for that.
    Im still trying to understand How/Why this works.
    Not yet at a place where I can do this on my own.
    Is there some reading other than bland documentation which describes how these functions work and what they do?
    I really need to be able to figure out how to parse these fields for other queries I need to write.

  5. #4
    Join Date
    Nov 2003
    Posts
    2,988
    Provided Answers: 23
    If you digest the query step by step it's easier to understand, that's why I recommended to only run the plain "unnesting" part.

    The problem with bringing "document oriented" data together with SQL is, that SQL can only deal with sets of data (rows and columns).

    So the first thing you need to do in your query is to turn the single JSON object with many (embedded) objects into multiple rows (this is essentially normalizing your de-normalized data).

    This is what the cross join lateral jsonb_each(..) part does. For JSON object found under the "path" network -> data a separate row is returned by the function with a "sub-set" of the original JSON object as a new JSON object.

    The query
    Code:
    select t.doc
    from bar
       cross join lateral jsonb_each(bar.data #> array['network','data']) as t(id,doc)
    returns:

    doc
    -------------------------------------------------------------------------
    {"id": 9, "info": {"errors": "", "version": 10753}, "state": 0}
    {"id": 20, "info": {"errors": "", "version": 4983}, "state": 0}
    {"id": 34, "info": {"errors": "", "version": 1842}, "state": 0}
    {"id": 41, "info": {"errors": "Message 1", "version": 43490}, "state": 1}



    Once you have that set of rows, you can filter out those that do not have the state = 0 using where t.doc -> 'state' <> '0'.

    The select list simple extract the attributes of the JSON object returned in the column doc, e.g. the t.doc -> 'id' as id


    My personal opinion is that too many people just follow the "JSON hype" and think, just because you can you also should store everything in JSON and forget everything about a sound database model and proper normalization. There might be some cases where doing this is warranted, but if you start unnesting and normalizing your JSON data a lot, then the decision to store it as a JSON object is at least questionable.
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags: http://www.dbforums.com/misc.php?do=bbcode#code

    Tips for good questions:

    http://tkyte.blogspot.de/2005/06/how...questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html

  6. #5
    Join Date
    Jan 2017
    Posts
    12
    Really appreciate this, but I tried to make some variations and it failed, so im not really learning how this works...
    For example, if I have a slight variation where the outerr 2 objects are removed, then I have:
    Code:
    {
      "9": {
        "id": 9,
        "state": 0,
        "info": {
          "version": 10753,
          "errors": ""
        }
      },
      "41": {
        "id": 41,
        "state": 1,
        "info": {
          "version": 43490,
          "errors": "Message 1"
        }
      },
      "34": {
        "id": 34,
        "state": 0,
        "info": {
          "version": 1842,
          "errors": ""
        }
      },
      "20": {
        "id": 20,
        "state": 0,
        "info": {
          "version": 4983,
          "errors": ""
        }
      }
    }
    I then tried to remove the 'network','data' from the array[] syntax and it failed.
    So, I guess I understand the goal of decomposing the object, but I don't know how to do it under various situations, only this one specific case.
    That's why im looking for some kind of book, guide or other reference which will explain how this really works so I can make it work for various situations, not just one specific case.

    Another example:
    Code:
    [
      {
        "id": 0,
        "value": "0.0337"
      },
      {
        "id": 1,
        "value": "0.7371"
      },
    etc...
    ]
    This again is similar, but this time the top-level is an array of objects.
    There must be some written description of the logic used to choose the correct function and arguments to expand the object to rows as you identified above.
    Im just not seeing enough detail to know how to deal with the different cases.
    Last edited by usao; 01-17-17 at 23:27.

  7. #6
    Join Date
    Nov 2003
    Posts
    2,988
    Provided Answers: 23
    Quote Originally Posted by usao View Post
    Really appreciate this, but I tried to make some variations and it failed, so im not really learning how this works...
    For example, if I have a slight variation where the outerr 2 objects are removed, then I have:
    I then tried to remove the 'network','data' from the array[] syntax and it failed.
    Well you use the same function, you just pass it a different (part of) the JSON document - in this case the complete document, not a sub-document at a specific path:
    Code:
    select *
    from bar
      cross join lateral jsonb_each(data) as t(doc);
    Quote Originally Posted by usao View Post
    That's why im looking for some kind of book, guide or other reference which will explain how this really works so I can make it work for various situations, not just one specific case.
    There can't be such a book because there is an infinite number of ways to structure a JSON document. You need to go through the available functions that are documented in the manual: https://www.postgresql.org/docs/curr...ions-json.html and understand what each one does and how that applies to each structure of the JSON documents you have.

    Quote Originally Posted by usao View Post
    Another example: .... This again is similar, but this time the top-level is an array of objects.
    In this case you need to use jsonb_array_elements():
    Code:
    select t.*
    from bar
      cross join lateral jsonb_array_elements(data) as t(doc);
    Quote Originally Posted by usao View Post
    There must be some written description of the logic used to choose the correct function and arguments to expand the object to rows as you identified above.
    The only thing that is documented, is which functions exist, the "logic" on how to apply them is different for each and every JSON document out there.

    Again: if you constantly find yourself unnesting JSON documents, you might want to think about storing your data in a properly normalized model.
    I will not read nor answer questions where the SQL code is messy and not formatted properly using [code] tags: http://www.dbforums.com/misc.php?do=bbcode#code

    Tips for good questions:

    http://tkyte.blogspot.de/2005/06/how...questions.html
    http://wiki.postgresql.org/wiki/SlowQueryQuestions
    http://catb.org/esr/faqs/smart-questions.html

  8. #7
    Join Date
    Jan 2017
    Posts
    12
    Ok, slowly starting to sink in... Let me play with this and see what I can make happen.
    As for the model, the initial tables will contain dumps of the raw json objects from the sources.
    Ideally I will create either base tables or views which give me the normalized form, but to get there I need to be able to parse the json data itself.
    So, have to get over the hump of parsing in order to get to the normalized form.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •