Call for Grand Challenge Solutions

The DEBS Grand Challenge is a series of competitions, that started in 2010, in which both academics and professionals compete with the goal of building faster and more accurate distributed and event-based system. Every year, the DEBS Grand Challenge participants have a chance to explore a new data set and a new problem and can compare their results based on the common evaluation criteria. The winners of the challenge are announced during the conference. The 2020 DEBS Grand Challenge focuses on Non-Intrusive Load Monitoring (NILM). The goal of the challenge is to detect when appliances contributing to an aggregated stream of voltage and current readings from a smart meter are switched on or off. NILM is leveraged in many contexts, ranging from monitoring of energy consumption to home automation. The 2020 Grand Challenge evaluation platform is based on the HOBBIT project.

Awards and Selection Process

Participants of the challenge compete for two awards: (1) the performance award and (2) the audience award. The winner of the performance award will be determined through the automated evaluation platform, according to the evaluation criteria specified below. Evaluation criteria measure the speed and correctness of submitted solutions. The winning team of the performance award will receive 1000 USD as prize money. The winner of the audience award will be determined amongst the finalists who present in the Grand Challenge session of the DEBS conference. In this session, the audience will be asked to vote for the solution with the most interesting concepts. The solution with the highest number of votes wins. The intention of the audience award is to highlight the qualities of the solutions that are not tied to performance. Specifically, the audience and challenge participants are encouraged to consider the following aspects:

  • The originality of the solution
  • Quality of the solution architecture (e.g. flexibility, re-usability, extensibility, generality, …)
  • There are two ways for teams to become finalists and get a presentation slot in the Grand Challenge session during the DEBS Conference: (1) up to two teams with the best performance (according to the final evaluation) will be nominated; (2) the Grand Challenge Program Committee will review submitted papers for each solution and nominate up to two teams with the most novel concepts. All submissions of sufficient quality that do not make it to the finals will get a chance to be presented at the DEBS conference as posters. The quality of the submissions will be determined based on the review process performed by the DEBS Grand Challenge Program Committee.


    The data provided for the challenge consists of energy measurements from a smart meter. The schema of the input tuple is thus < i,v,c >, where attributes i, v and c represent the tuple sequence id, the voltage and the current, respectively. Participants get access to a sample data set after they register in HotCRP. The links will be shared by the GC chairs upon registration. From the 14th of January, the full dataset is available for participants that register to the challenge via HotCRP (

    Query 1

    The first query aims at detecting devices that are turned on or off in a stream of voltage and current readings from a smart meter, resembling the aggregated energy consumption in an office building. The processing steps are described in the following.

    1. Each input tuple is first aggregated using a tuple-based window W1 of size and advance 1000 to compute the active and reactive power features. These are computed as follows. Active Power P = \sum (v \times c) / 1000. Apparent Power S = voltage_RMS \times current_RMS, with root-mean-square (RMS) values of voltage and current per period respectively. Reactive Power Q= \sqrt{S^2 - P^2}
    2. The resulting stream of features is then processed based on the algorithm described in the paper “Sequential clustering-based event detection for Non-Intrusive Load Monitoring” (Barsim, Karim Said, and Bin Yang. Computer Science & Information Technology 10, 2016). More concretely, a tuple-based window W2 of varying size is maintained and
      1. Each new pair of features (active and reactive power) are added to window W2.
      2. The DBSCAN algorithm is applied to the window: it has a forward and a backward pass.
        1. Forward pass:
          1. The event model constraints are checked.
          2. Clustering loss is computed. If model constraints are valid and loss is below a threshold, an event is detected. In this case the backward pass is started. Else, the next tuple is added to window W2 and the algorithm continues with step 1.
        2. Backward pass, if an event is detected in the forward pass:
          1. remove the oldest data tuple in each iteration and apply the DBSCAN clustering. We do this, as long as the event we detected previously in the forward pass is still detected. If it is not detected any more, we stop and return the resulting event. By doing so, we get more stable steady-state sections from the algorithm.
      3. If an event is not detected and window W2 contains more than 100 elements, the window is then emptied.

    For each 1000th input tuple (i.e., each time window W1 is full), the query should produce a result. Such result specifies whether an event has been detected for such window and the sequence number of the window W1 for which the event is detected. The schema of the output stream is thus: < s,d,event_s >, where s is the window W1 input tuple sequence id to which this output tuple refers to, d is a boolean attribute that specifies whether an event is detected or not and event_s is the window W1 sequence id of the detected event (if any).

    Query 2

    The second query is a variation of Query 1. More concretely, this query is expected to process an input stream that can contain both late arrivals as well as missing tuples. Since the semantics of Query 2 are equal to those of Query 1, participants are expected to provide a solution that is able to trade-off timeliness and accuracy of results. Upon reception of an input tuple, the proposed solution can produce an output (based only on the data observed so far) or decide to postpone the output after late arriving input tuples are received. In this second case, the solution can wait for possibly late arrivals to provide an output that more accurately identifies the timestamp at which an event is detected (if any).  In order to differentiate between missing tuples and late arrivals, notice that late tuples are bound to 20 windows (20000 time units). The results produced by the submitted solutions will be compared with those produced by a baseline that observes all input data in order (i.e., with no missing tuple nor late arrivals) and scored accordingly to the process described in the remainder. Please notice that each output tuple should be produced no more than once.

    Evaluation for Query 1

    Evaluation of Query 1 addresses two aspects: (1) correctness of results and (2) processing speed. The first is taken into account by comparing the results of a proposed solution with that of our baseline. Only solutions that produce correct results (i.e., that produce the same set of output tuples produced by our baseline and in the same order) are considered as valid. The second aspect is captured with multiple measures, the total runtime (rank_0) and the latency (rank_1). The specifics of the ranking for the processing speed and quality of results are defined as follows. The total runtime (rank_0) is the time span between the sending of the first input tuple and the reception of the result for the last input tuple. The lower the total runtime measure the higher the position in the ranking. The latency (rank_1) is measured as the average time span between retrieving an input tuple and providing the corresponding output tuple. The lower the latency the higher the position in the ranking.

    Evaluation for Query 2

    Evaluation of Query 2 addresses two aspects: (1) timeliness of produced results (rank_3) and (2) accuracy (rank_4). More concretely, being:

  • tout the expected tuple produced by a baseline that is fed with all input data in order upon processing of input tuple tin, and
  • t’in the latest input tuple retrieved by a submitted solution when the output tuple t’out is produced, with t’in possibly larger than tin, the timeliness of each output tuple t’out is computed as: max(0,1 - ( t’in.i - tin.i ) / 10.0). while its accuracy is computed, for output tuples in which an event is detected by the baseline as: max(0,1 - abs( t’out.event_s - tout.event_s ) / 10.0). The overall final ranks are based on the sum of all tuples’ scores. The higher the cumulative score, the higher the position in the ranking.
  • Overall Evaluation

    The overall final rank is calculated as the sum of all rankings for both queries. The solution with the lowest overall final rank wins the performance award. The paper review score will be used in case of ties.

    Benchmarking System

    For local evaluation/testing you can download a docker compose based setup here. Questions regarding the data set and the evaluation platform should be posted in the issue tracker of the aforementioned git repository. A baseline implementation for both queries is made available via the provided docker image.


    Participation in the DEBS 2020 Grand Challenge consists of three steps: (1) registration, (2) iterative solution submission, and (3) paper submission. The first step is to pre-register your submission in the HotCRP Grand Challenge track, at the following link: Pre-registration in HotCRP is necessary to (1) state the intent of a team to participate in the Grand Challenge and to establish a communication channel with the Grand Challenge organizers and (2) get access to the challenge input data. At this step, it is sufficient to start a new submission, register the members of the team as authors and submit an interims title for your work. Participants are not requested to submit a PDF at this stage. Solutions to the challenge, once developed, must be submitted to the evaluation platform in order to get it benchmarked in the challenge. The evaluation platform provides detailed feedback on performance and allows to update the solution in an iterative process. A solution can be continuously improved until the challenge closing date. Evaluation results of the last submitted solution will be used for the final performance ranking. The last step is to upload a short paper (minimum 2 pages, maximum 6 pages) describing the final solution to the HotCRP system. All papers will be reviewed by the DEBS Grand Challenge Program Committee to assess the merit and originality of submitted solutions. All solutions of sufficient quality will be presented during the poster session at the DEBS 2020 conference.

    Important Dates

  • Release of the challenge, initial data set, and an offline docker container with the benchmarking system for local testing December 9th, 2019
  • Release of a full dataset January 14th, 2020
  • Evaluation platform is online February 14th, 2020
  • Deadline for uploading final solution to the evaluation platform April 7th April 24th 2020, 11pm CET
  • Deadline for short paper submission April 14th May 1st 2020, 11pm CET
  • Notification of acceptance May 1st May 15th, 2020
  • Important Dates

    Events Dates
    Tutorial Proposal Submission March 3rd March 25th, 2020
    Abstract Submission for Research Track March 13th March 27th, 2020
    Research and Industry Paper Submission March 20th April 3rd, 2020
    Grand Challenge Solution Submission April 7th April 24th, 2020
    Author Notification Industry Track May 3rd, 2020
    Author Notification Research Track May 3rd May 18th, 2020
    Doctoral Symposium Submission May 10th May 18th, 2020
    Poster & Demo Submission May 18th May 25th, 2020
    Camera Ready for All Tracks May 28th June 4th, 2020
    Conference July 13th – 17th, 2020