Perform Distinct Count over MongoDB Nested Subdocuments

Pengjinfeng
3 min readJan 14, 2021

--

Nested structures are common in MongoDB data. The database uses nested subdocuments to represent the one-to-many relationship. But, when you need to group nested subdocuments and perform distinct, you should first expand the nested structure to flat structure before using $group to perform aggregation.
Below is data in fund collection. The task is to group documents by id and get the number of unique fundcode under shares and trade fields respectively.

The MongoDB way of doing this: Use $group to group documents by id, extract fundcode under each document, use $unwind to split documents by shares and trade to convert to flat structure, use $addToSet to perform distinct during the second grouping, and then count fundcode under shares and trade respectively. The whole process is complicated.

All can be easy with esProc. You can rearrange the trade subdocument into a table sequence, group records and perform distinct and count. It’s much easier.
Download esProc installation package and free license file HERE.

Implementation directions:
1. Write script fund.dfx in esProc:

2. Debug and execute the script. Below is A3’s value:

3. Execute the script to return the final result:

Since the fields under subdocument trade have different structures, we convert them into a table sequence for the convenience of subsequent processing.
esProc offers JDBC interface, so you can easily integrate the script into a Java program:

public static void doWork(){
Connection con = null;
java.sql.Statement st;
try{
Class.forName(“com.esproc.jdbc.InternalDriver”);
con = DriverManager.getConnection(“jdbc:esproc:local://”);
// Call script fund.dfx
st=con.createStatement();
ResultSet rst = st.executeQuery(“call fund”);
System.out.println(rst);
}catch(Exception e){
System.out.println(e);
}finally{
// Close database connection
if (con!=null){
try {
con.close();
}catch(Exception e){
System.out.println(e);
}
}
}
}

When fields under a certain nested subdocument have different structures, we can dynamically get the desired fields using union, intersection or other operation as needed. An easier alternative is to use a user-defined field to format data into a table sequence. The latter method makes it more convenient to further process data. Read How to Call an SPL Script in Java to learn more about integration of esProc script into a Java program.

--

--

No responses yet