API documentation

MPC

This part contains implementation of model predictive control.

Main code

The main code for MPC uses cvxpy and Gurobi solver to solve the problem as a convex optimization problem, several versions of the MPC code are used for solve problem in different scenarios like hourly price and Net metering.

MPC.MPC2.cleanup(val)

Round up little number to 0

Parameters:val – learned policy
Returns:None
MPC.MPC2.compute(hour_var, battery_var, last)

Get the best policy of next day

Parameters:
  • hour_var – index of hour of a year
  • battery_var – current battery SoC
  • last – binary value, if it is the last episode
Returns:

None

MPC.MPC2.init_ground_truth(datafile)

Initialise house hold data, solar data and price

Parameters:datafile – path input file
Returns:None
MPC.MPC2.predict_day(start)

load new observations,predict home use and solar generate of the next time slot

Parameters:start – index of hour of a year
Returns:use_list: predicted home use of next ac list: predicted ac output of next day

Create hourly price

Create hourly price changes unix time to UTC time and generate a hourly price table

MPC.create_hourly_price_table.create()

create hourly price table, read the the raw data with unix time convert to utc time with proper time zone

Returns:None

Create TOU price

Create TOU price table creates a TOU price table for a year, so it can be used loaded as matrix in the solver.

RL

Implementation of Reinforcement Learning method

A2C

A2C Implementation of Actor Critic method which will genetrate the stratgy given states and reward.

class RL.solar_a2c_nonlinear.PolicyEstimator(learning_rate=0.01, scope='policy_estimator')

Bases: object

Policy Function approximator.

predict(state, sess=None)

predict the action given a state

Parameters:
  • state – current state
  • sess – tensorflow session
Returns:

Updated session

update(state, target, action, sess=None)

Update the policy function

Parameters:
  • state – current state
  • target – td target
  • action – learned action
  • sess – tensorflow session
Returns:

loss

class RL.solar_a2c_nonlinear.ValueEstimator(learning_rate=0.1, scope='value_estimator')

Bases: object

Value Function approximator.

predict(state, sess=None)

predict the value given a state

Parameters:
  • state – current state
  • sess – tensorflow session
Returns:

Updated session

update(state, target, sess=None)

Update the policy function

Parameters:
  • state – current state
  • target – td target
  • action – learned action
  • sess – tensorflow session
Returns:

loss

RL.solar_a2c_nonlinear.actor_critic(env, month_var, battery_var, estimator_policy, estimator_value, num_episodes, discount_factor=1.0)

Actor Critic Algorithm. Optimizes the policy

function approximator using policy gradient.

Parameters:
  • env – OpenAI environment.
  • month_var – index of month
  • battery_var – current battery SoC
  • estimator_policy – Policy Function to be optimized
  • estimator_value – Value function approximator, used as a critic
  • num_episodes – Number of episodes to run for
  • discount_factor – Time-discount factor
Returns:

An EpisodeStats object with two numpy arrays for episode_lengths and episode_rewards.

RL.solar_a2c_nonlinear.featurize_state(state)

RBF feature representation of a given state

Parameters:state – current state
Returns:state with new feature representation

Environment

Environment Formulate the problem into MDP environment, contains features of reset initialize states, calulate reward of a given states, and simulate next states.

class RL.environment.EnergyEnvironment(mode='ground_truth', charge_mode='TOU', payment_cycle=24, datafile='test')

Bases: object

check_valid_action(action)

Check if the current action is a valid action given the constraints

Parameters:action – action of the current time slot
Returns:Binary value True or False
init_ground_truth()

Initialize the ground truth into states

Read solar generation data, house load related data and them into a 2d array.

Returns:None
init_price()

Initial the price table, read input file

Read row by row and save as an array

Returns:None
reset()

Initialise the state,

Rest current state to the end of last episode

Returns:None
step(action)

The main function of MDP environment, it read in a action,

Calculate the reward of the this action, return the next state

Parameters:action – The action the current state
Return total_need_grid:
 total power take from the gird return_state: the next state Binary Value: If the episode terminate

Baseline

Contains different rule based baselines for different scenarios.

No solar panel

Get no solar Get baseline strategy for no solar and no batery installed.

Baseline.get_nosolar.eval()

Evaluate the policy if it is already learned

Returns:None
Baseline.get_nosolar.pre_train(episodes)

Get the baseline if there is no solar panel installed

Parameters:episodes – Number of episodes to be run
Returns:None

No battery

Get no battery Get baseline strategy for no battery installed and with solar installed.

Baseline.get_nostorage.eval()

Evaluate the policy if it is already learned

Returns:None
Baseline.get_nostorage.pre_train(episodes)

Learn a policy if there is solar panel installed but no battery installed Sell solar power to the grid if there is reminder after home use

Parameters:episodes – Number of episodes to be run
Returns:None

Rule based policy

Get RBC Get a baseline strategy based on rule based TOU price.

Baseline.get_rbc.pre_train(episodes)

learn the rule based policy for a year charge at max rate when solar tariff is low discharge at max rate when solar tariff is high

Parameters:episodes – Number of episodes to be run
Returns:None

These three implementations will call assocaited environments developed for these different scenarios.

DLC

Contains implementation of direct mapping method.

Mapping

Mapping Mapping from avaiable features of each time slots to optimal actions solved by the solver and oracle data.

Training neural network

DLC.tune.create_model(x_train, y_train, x_test, y_test)

Model providing function: Create Keras model with double curly brackets dropped-in as needed.

Parameters:
  • x_train – train set
  • y_train – train target
  • x_test – test set
  • y_test – test target
Returns:

Return value has to be a valid python dictionary with two customary keys: - loss: Specify a numeric evaluation metric to be minimized - status: Just use STATUS_OK and see hyperopt documentation if not feasible - The last one is optional, though recommended, namely: - model: specify the model just created so that we can later use it again.

DLC.tune.data()

Data providing function:

This function is separated from create_model() so that hyperopt won’t reload data for each evaluation run.

Returns:None