Oracle

Count, Distinct, and Nulls

Count, Distinct, and Nulls

By Yuli Vasiliev

Counting distinct values in a dataset that may include nulls is a fairly common operation when it comes to data analysis. Sometimes, you need to count only not-null distinct values, while next time you may need to include nulls in your count. The examples in this article cover both situations, querying the ext_json external table created as described in one of my preceding articles: Loading JSON Through External Tables that can be found at https://sswug.org/yulivasiliev/oracle/loading-json-through-external-tables/

The ext_json external table mentioned above derives data from the PurchaseOrders.dmp dump file that comes with Oracle Database 12c as a sample. This file contains 10000 sample PO documents in JSON format, each of which represents a row when loaded into the ext_json table. Proceeding to the sample PO document structure, it contains an Address object, which in turn includes field country, among others. Let’s perform an analysis of the entire dataset to find out how many distinct values field country contains. This can be done with the following query:

SELECT COUNT(DISTINCT a.JSON_DOC.ShippingInstructions.Address.country) from ext_json a;

As you can see, there are only four countries presented in the sample PO documents:

4

Now, to find out if there are POs that contain null values in the country fields, you might issue the following query:

SELECT COUNT(DISTINCT coalesce(a.JSON_DOC.ShippingInstructions.Address.country, ‘x’)) from ext_json a;

The result illustrates that there are such values. However, it remains unclear how many.

5

Let’s first however learn how many PO documents include non-null values in their country field. This can be done with the following query:

SELECT COUNT(a.JSON_DOC.ShippingInstructions.Address.country) from ext_json a;

This should generate the following result, showing the number of POs with non-null country field:

9899

Now to find out how many null country fields are in the entire dataset, you might use the following query that uses a simple math to calculate the result:

SELECT COUNT(coalesce(a.JSON_DOC.ShippingInstructions.Address.country, ‘x’)) – COUNT(a.JSON_DOC.ShippingInstructions.Address.country) from ext_json a;

Expectedly, the above should generate the following number:

101

While some fields – like country discussed so far – may actually do with a few distinct values over the entire dataset, others – like Reference that will be discussed below – are supposed to have a unique value in each document. Let’s perform some analysis to make sure it is actually so in this dataset. First let’s count the number of POs that have non-null values in the Reference field:

SELECT COUNT(a.JSON_DOC.Reference) from ext_json a;

Expectedly, it gives you the following number:

10000

However, applying the DISTINCT clause in the above query shows that there are duplicate values in this field:

SELECT COUNT(DISTINCT a.JSON_DOC.Reference) from ext_json a;

8861

To make sure there are no nulls out here, you might issue the following query:

SELECT COUNT(DISTINCT COALESCE(a.JSON_DOC.Reference, ‘x’)) from ext_json a;

This should give you the same result as previously, which indicates that there are no null values among references:

8861

Counting the duplicates can be done now as follows:

SELECT COUNT(a.JSON_DOC.Reference) – COUNT(DISTINCT a.JSON_DOC.Reference) from ext_json a;

1139

Conclusion

As you learned in this article, COUNT, DISTINCT, and COALESCE can be used together as an efficient means for counting datasets containing null values in some fields.

About the Author

Yuli Vasiliev is a software developer, freelance author, and consultant currently specializing in open source development, Java technologies, business intelligence (BI), databases, and service-oriented architecture (SOA). He is the author of Oracle Business Intelligence: The Condensed Guide to Analysis and Reporting (Packt, 2010) and PHP Oracle Web Development (Packt, 2007) as well as a series of other books on the Oracle technology.