Skip to content

Inheritance#2

Draft
kuraisle wants to merge 7 commits into
mainfrom
inheritance
Draft

Inheritance#2
kuraisle wants to merge 7 commits into
mainfrom
inheritance

Conversation

@kuraisle
Copy link
Copy Markdown
Member

♻️ Refactor
✨ Feature

PR Description

We talked about the value of atomic classes for each statistic aggregation versus having one class with None defaults for the possible values and errors thrown on trying to call the methods to calculate incompatible statistics.
The fact that we had such divergent opinions led me to believe that we were both wrong.
This is my attempt to do it right.

Protocols

We want the data held by a class to be compatible with multiple final results.
For example, anything that holds the data to compute a sample's variance can also calculate the count, sum, and mean.
Python will just do this anyway through duck-typing, but your LSP will shout at you with the existing version.
I've used protocols here for structural typing.
As the data for variance holds the data for count, sum, and mean, the VariancePartialProtocol inherits from the preceding protocols.
LSP then happy

Composite Aggregators

So much for the protocols, but we need concrete objects that can do the work.
The stat_aggregators module exposes StatAggregators, which hold data, and have properties for calculating the relevant statistics.
I think this provides a convenient API for the underlying logic.
There's a small amount of repetition, but the use of protocols for the types in combiners keeps this pretty short

@kuraisle
Copy link
Copy Markdown
Member Author

kuraisle commented Jun 2, 2026

How parts depend on one another

The core of this is the combine method. This is defined on the CombinerProtocol as a function that takes a list of one type and returns a second type.
This is implemented in the classes Combiner and SumCombiner, using the build_combine_function... function, which takes a function that adds two things of one type together to return the same type (aggregate), and a function that takes something of that type and returns another.

The statistical and scalar combiners are then instances of these, supplied with specific methods that mean they can be used to calculate the named statistic.

graph TD
  Combiner -- implements --> CombinerProtocol
  SumCombiner -- implements --> CombinerProtocol
  Combiner -- uses --> build_combine_function
  SumCombiner -- uses --> build_combine_function
  statistical_combiners -- instances of --> SumCombiner
  scalar_combiners -- instances of --> Combiner
Loading

What's with the protocols?

The combiner instances have combine methods that have input types that are protocols. This means the methods work to any object that exposes the right properties. Want to use a MeanPartial for the mean_combiner? Great. Want to use a VariancePartial? No problem. Want to use your own object that gets .count and .sum from somewhere? Also fine.

This extends to the StatAggregators. They have a data attribute, which is a list of things satisfying an interface. We have the concrete classes for these in the partials module, but you don't have to use them. Nice and flexible.

Dependency overview

graph TD
  subgraph partials
    protocol(Partial protocols)
    mean(Mean partial)
    variance(Variance partial)
  end
  subgraph combiners
    core
    scalar
    statistical
  end
  stat_aggregators
  scalar -- implements --> core
  statistical -- implements --> core
  mean -- implements --> protocol
  variance -- implements --> protocol
  statistical -- uses --> protocol
  stat_aggregators -- uses --> protocol
  stat_aggregators -- uses --> statistical
Loading

Existing API

The API I wish to define should provide some out-of-the-box classes for common statistics that can either be used in another class or be used in notebooks. It should be easy to extend, either using the existing combiners to calculate federated statistics on user-defined objects, or to provide (with the Combiner) a convenient way to define functions to federate user-defined statistics.

stat_aggregators

These should be super easy to use! For example, if you have a function to read some file to a VariancePartial

#reader_fun :: Path -> VariancePartial

aggregator = VarianceStatAggregator(data=[reader_fun(path) for path in paths])

aggregator.count #total for the dataset
aggregator.sum #sum for the dataset
aggregator.mean #mean for the dataset
#etc.

If you already have a list of objects that have count, sum and sum_of_squares attributes, you can pass them to the aggregator and get the same properties.

x_combiner

If you don't want to use the stats aggregators, you can also use the combiners directly

#reader_fun :: Path -> VariancePartialProtocol
data=[reader_fun(path) for path in paths]

count_combiner.combine(data)
sum_combiner.combine(data)
mean_combiner.combine(data)
etc.

Combiner

Want your own stat? Cool

def some_statistic(...):
  ...

tt_combiner = SumCombiner(finalise=some_stat)
data=[reader_fun(path) for path in paths]
tt_combiner.combine(data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant