Written by Super User.

28- Django Annotate and Aggregate Queryset Methods

In the last article, we created a Dashboard by using ChartJS and we mostly needed to use Annotate and Aggregate Queryset methods. Let's have a closer look at these 2 methods from simple to more complex queries.

Aggregate operate over the rowset to get a single value from the rowset.(For example, sum of all prices in the rowset). Aggregate is applied on entire QuerySet and it generate result (summary) values over an entire QuerySet.

Annotate generate an independent group  for each object in a QuerySet. (For example, counting like and dislike votes for each movie)

 

Let's work on examples. This the first table that I am using for my examples.

class Movie(models.Model):
    title= models.CharField(max_length=100)
    gender= models.CharField(max_length=20)
    description= models.CharField(max_length=1000, blank=True, null=True)
    moviefile= models.FileField(upload_to='movie', blank=True, null=True)
    photo= models.FileField(upload_to='photo',blank=True, null=True)
    publishDate= models.DateField(blank=True, null=True)
    imdbID= models.CharField(max_length=10, blank=True, null=True)
    movieapp_score= models.FloatField(blank=True, null=True)
    date_added= models.DateField(auto_now=True)
    numberOfViews= models.IntegerField(default=0)
    duration= models.FloatField(blank=True, null=True)
    resolution= models.CharField(max_length=5)
    status = models.BooleanField(default=False)
    users = models.ManyToManyField(MyCustomUser, through='UserComment')
    usermovielist = models.ManyToManyField(MyCustomUser, related_name="usermovielist")
    numberOfLikes = models.IntegerField(blank=True, null=True)
    numberOfDislikes = models.IntegerField(blank=True, null=True)
    
    def __str__(self):
        return self.title
    class Meta:
        ordering = ('id',)

 

On your terminal run shell and import your database tables like below. Django has a count function which returns a number value.

python manage.py shell
from moviesapp.models import Movie
#Now I can run my queries 
>>> Movie.objects.count()
14

 

Aggregate: (calculating result from entire rows)

Instead of using Django count function, we can use aggregate method to count the objects like this. Note that, result is a dictionary.

>>> from django.db.models import Count
>>> Movie.objects.aggregate(Count('id'))
{'id__count': 14}

 

The key value in the dictionary "id__Count" is generated automatically. We can specify the key ifwe want it to have a specific name like this.

 

>>> from django.db.models import Count
>>> Movie.objects.aggregate(myCount=Count('id'))
{'myCount': 14}

 

Aggregate generates results for the entire table and sometimes we may need to run the query on specific group of rows. This is when we use annotate. We can group the query result by looking at their values.

Movie table has a field named status. Status of a movie might be True or False (Published or Unpublished). So I have 5 movies that are Unpublished and 9 movies are Published.

values: what table field to group

annotate: what to aggregate

#This query counts the number of objects by grouping their status value (True or False)
>>> Movie.objects.values('status').annotate(myCount=Count('id'))
<QuerySet [{'status': False, 'myCount': 5}, {'status': True, 'myCount': 9}]>


#The query below checks the values of resolution field and group them and counts the number of objects for each group
>>> Movie.objects.values('resolution').annotate(myCount=Count('id'))
<QuerySet [{'resolution': '480p', 'myCount': 2}, {'resolution': '720p', 'myCount': 6}, {'resolution': '1080p', 'myCount': 6}]>

#I can get the number of movies for each gender by using the same query
>>> Movie.objects.values('gender').annotate(myCount=Count('id'))
<QuerySet [{'gender': 'War', 'myCount': 4}, {'gender': 'Drama', 'myCount': 5}, {'gender': 'Mind Blowing', 'myCount': 5}]>

 

 

Let me import an intermediate table from my models and run similar queries on that. This is the model

class Watched(models.Model):
    user = models.ForeignKey('MyCustomUser', blank=True, null=True, related_name='Watched_User', on_delete=models.CASCADE)
    movie = models.ForeignKey('Movie',blank=True, null=True, related_name='Watched_Movie', on_delete=models.CASCADE)
    date_watched = models.DateField(auto_now=True)
    def __str__(self):
        return self.user.username
    class Meta:
        ordering = ('id',)

 

Here I can group objects by their watched date value and count each groups of dates.

>>> from moviesapp.models import Watched
>>> Watched.objects.values('date_watched').annotate(myCount=Count('id'))
<QuerySet [{'date_watched': datetime.date(2022, 11, 8), 'myCount': 14}, {'date_watched': datetime.date(2022, 9, 19), 'myCount': 3}, {'date_watched': datetime.date(2022, 11, 12), 'myCount': 2}]

 

Filtering with Group By:

#The query groups by looking at their resolution
#includes only published movies (filters)
#and counts them (aggregates). Returns a key: value pair foreach group
>>> Movie.objects.values('resolution').filter(status=True).annotate(myCount=Count('id'))
<QuerySet [{'resolution': '480p', 'myCount': 2}, {'resolution': '720p', 'myCount': 4}, {'resolution': '1080p', 'myCount': 3}]>


 

Sorting with Group By:

>>> Movie.objects.values('resolution').annotate(myCount=Count('id')).order_by('resolution')
<QuerySet [{'resolution': '1080p', 'myCount': 6}, {'resolution': '480p', 'myCount': 2}, {'resolution': '720p', 'myCount': 6}]>

 

Group By an Expression (Extract):

The below queries returns montly and yearly movie watch counts

#The query groups by months from a datefield and counts them
>>> Watched.objects.values('date_watched__month').annotate(myCounter=Count('id'))
<QuerySet [{'date_watched__month': 9, 'myCounter': 3}, {'date_watched__month': 11, 'myCounter': 18}]>

#Or year
>>> Watched.objects.values('date_watched__year').annotate(myCounter=Count('id'))
<QuerySet [{'date_watched__year': 2022, 'myCounter': 21}]>

#Or both
>>> Watched.objects.values('date_watched__month', 'date_watched__year').annotate(myCounter=Count('id'))
<QuerySet [{'date_watched__month': 9, 'date_watched__year': 2022, 'myCounter': 3}, {'date_watched__month': 11, 'date_watched__year': 2022, 'myCounter': 18}]>

 

Group By Across Relations:

Movie and Watched_Movie are 2 different models. I count movie objects by grouping by their watched date.

>>> Movie.objects.values('Watched_Movie__date_watched').annotate(mycounter=Count('id'))
<QuerySet [{'Watched_Movie__date_watched': None, 'mycounter': 2}, {'Watched_Movie__date_watched': datetime.date(2022, 11, 8), 'mycounter': 14}, {'Watched_Movie__date_watched': datetime.date(2022, 9, 19), 'mycounter': 3}, {'Watched_Movie__date_watched': datetime.date(2022, 11, 12), 'mycounter': 4}]>

 

 

Queries Used In ChartJS:

 MOST WATCHED 5 MOVIES

#########################################################################
#QUERIES USED FOR TOP5 MOVIES
#########################################################################
    topFiveMovie = Movie.objects.annotate(myCounter=Count('Watched_Movie')).order_by('-myCounter')[:5]
    topFiveWatched = Watched.objects.values('movie__id').annotate(myCounter=Count('id')).order_by('-myCounter')[:5]

#########################################################################
#EXPLANATION:
#########################################################################
#Each movie object in Watched_Movie table is grouped and counted. 
#First 5 objects returned from Movie table. It is put in an order by their myCounter value. See the output from shell. 
>>> Movie.objects.annotate(myCounter=Count('Watched_Movie')).order_by('-myCounter')[:5]
<QuerySet [<Movie: Iron Man>, <Movie: Brave Heart>, <Movie: Gladiator>, <Movie: Fight Club>, <Movie: The Mummy>]>

#then I assigned the returned queryset in a variable named topFiveMovie
#Second query groups objects in Watched table by movie id from Movie table and counts them
#See the output from shell.

>>> Watched.objects.values('movie__id').annotate(myCounter=Count('id')).order_by('-myCounter')[:5]
<QuerySet [{'movie__id': 23, 'myCounter': 11}, {'movie__id': 3, 'myCounter': 6}, {'movie__id': 5, 'myCounter': 6}, {'movie__id': 4, 'myCounter': 3}, {'movie__id': 19, 'myCounter': 2}]>

#Finally I passed topFiveMovie and topFiveWatched to my template and there I could access their corresponding values
labels: [ {% for t in topFiveMovie %} "{{t.title}}", {% endfor %} ],
datasets: [{
label: 'Most Watched Movies',
data: [ {% for n in topFiveWatched %} {{n.myCounter}}, {% endfor %} ],

 

HIGHEST MOVIEAPP RATINGS:

#########################################################################
#QUERIES USED FOR HIGHEST RATED MOVIES
#########################################################################

AppScoreValues= Movie.objects.order_by('-movieapp_score').values_list('movieapp_score', flat=True).exclude(movieapp_score__isnull=True)
MovieAppScore= Movie.objects.order_by('-movieapp_score').filter(movieapp_score__in=AppScoreValues[:5])
#########################################################################
#EXPLANATION:
#########################################################################
#Query gets the values in MoviesApp_Scores (high to low), then returned the values kept in a variable (AppScoreValues).Null values are excluded. Thanks to Flat=True, we get singlevalues not tuples(for example <QuerySet [1, 2]> ).
#In the second query,5 highest rated Movie objects are fetched that correspond  to AppScoreValues.
#See the shell output for these 2 queries:

>>> AppScoreValues=Movie.objects.order_by('-movieapp_score').values_list('movieapp_score', flat=True).exclude(movieapp_score__isnull=True)
<QuerySet [9.2, 8.9, 8.8, 7.8, 7.3, 7.0, 7.0, 6.7, 5.0]>

>>> Movie.objects.order_by('-movieapp_score').filter(movieapp_score__in=AppScoreValues[:5])
<QuerySet [<Movie: Saving Private Ryan>, <Movie: Forrest Gump>, <Movie: Fight Club>, <Movie: The Hobbit: An Unexpected Journey>, <Movie: Troy>]>

 

NUMBER OF LIKES and DISLIKES

    #Same as the above query for different field (number of likes)
    numberOfLikesValues = Movie.objects.order_by('-numberOfLikes').values_list('numberOfLikes', flat=True).exclude(numberOfLikes__isnull=True)
    likes = (Movie.objects.order_by('-numberOfLikes').filter(numberOfLikes__in=numberOfLikesValues[:3]))

    #Same as the above query for different field (number of dislikes)
    numberOfDislikesValues = Movie.objects.order_by('-numberOfDislikes').values_list('numberOfDislikes', flat=True).exclude(numberOfDislikes__isnull=True)
    dislikes = Movie.objects.order_by('-numberOfDislikes').filter(numberOfDislikes__in=numberOfDislikesValues[:3])

#See the Shell outputs:
>>> numberOfLikesValues=Movie.objects.order_by('-numberOfLikes').values_list('numberOfLikes', flat=True).exclude(numberOfLikes__isnull=True)
<QuerySet [70, 60, 56, 54, 47, 15, 7, 3]>

>>> (Movie.objects.order_by('-numberOfLikes').filter(numberOfLikes__in=numberOfLikesValues[:3]))
<QuerySet [<Movie: Saving Private Ryan>, <Movie: Brave Heart>, <Movie: The Hobbit: An Unexpected Journey>]>


>>> numberOfDislikesValues = Movie.objects.order_by('-numberOfDislikes').values_list('numberOfDislikes', flat=True).exclude(numberOfDislikes__isnull=True)
<QuerySet [40, 5, 5, 4, 3, 2, 2, 2]>

>>> Movie.objects.order_by('-numberOfDislikes').filter(numberOfDislikes__in=numberOfDislikesValues[:3])
<QuerySet [<Movie: The Mummy>, <Movie: Gladiator>, <Movie: Brave Heart>]>

 

MONTHLY AND YEARLY TOTAL NUMBER OF MOVIE VIEWS

watchByDate= Watched.objects.values('date_watched__month', 'date_watched__year').annotate(myCounter=Count('id'))
#########################################################################
#EXPLANATION:
#########################################################################
#Objects from Watched table are grouped by date_Watched field. To extract month and year from thedate field I used date_watched__month and date_watched__year. 
#Finally I counted the objects in myCounter variable. 
#Shell Output:

>>> Watched.objects.values('date_watched__month', 'date_watched__year').annotate(myCounter=Count('id'))
<QuerySet [{'date_watched__month': 9, 'date_watched__year': 2022, 'myCounter': 3}, {'date_watched__month': 11, 'date_watched__year': 2022, 'myCounter': 34}]>


#On my template, I could access the objects and values like this:
               
labels: [ {% for label in watchByDate %} {{label.date_watched__month}} +"-"+ {{label.date_watched__year}}, {% endfor %}],
datasets: [{
label: 'Movies Displayed',
data: [ {% for data in watchByDate %} {{data.myCounter}}, {% endfor %}],

 

Thanks for reading.