API documentation¶

MPC¶

This part contains implementation of model predictive control.

Main code¶

The main code for MPC uses cvxpy and Gurobi solver to solve the problem as a convex optimization problem, several versions of the MPC code are used for solve problem in different scenarios like hourly price and Net metering.

MPC.MPC2.cleanup(val)¶

Round up little number to 0

Parameters:	val – learned policy
Returns:	None

MPC.MPC2.compute(hour_var, battery_var, last)¶

Get the best policy of next day

Parameters:	hour_var – index of hour of a year battery_var – current battery SoC last – binary value, if it is the last episode
Returns:	None

MPC.MPC2.init_ground_truth(datafile)¶

Initialise house hold data, solar data and price

Parameters:	datafile – path input file
Returns:	None

MPC.MPC2.predict_day(start)¶

load new observations,predict home use and solar generate of the next time slot

Parameters:	start – index of hour of a year
Returns:	use_list: predicted home use of next ac list: predicted ac output of next day

Create hourly price¶

Create hourly price changes unix time to UTC time and generate a hourly price table

MPC.create_hourly_price_table.create()¶

create hourly price table, read the the raw data with unix time convert to utc time with proper time zone

Returns:	None

Create TOU price¶

Create TOU price table creates a TOU price table for a year, so it can be used loaded as matrix in the solver.

RL¶

Implementation of Reinforcement Learning method

A2C¶

A2C Implementation of Actor Critic method which will genetrate the stratgy given states and reward.

class RL.solar_a2c_nonlinear.PolicyEstimator(learning_rate=0.01, scope='policy_estimator')¶

Bases: object

Policy Function approximator.

predict(state, sess=None)¶

predict the action given a state

Parameters:	state – current state sess – tensorflow session
Returns:	Updated session

update(state, target, action, sess=None)¶

Update the policy function

Parameters:	state – current state target – td target action – learned action sess – tensorflow session
Returns:	loss

class RL.solar_a2c_nonlinear.ValueEstimator(learning_rate=0.1, scope='value_estimator')¶

Bases: object

Value Function approximator.

predict(state, sess=None)¶

predict the value given a state

Parameters:	state – current state sess – tensorflow session
Returns:	Updated session

update(state, target, sess=None)¶

Update the policy function

Parameters:	state – current state target – td target action – learned action sess – tensorflow session
Returns:	loss

RL.solar_a2c_nonlinear.actor_critic(env, month_var, battery_var, estimator_policy, estimator_value, num_episodes, discount_factor=1.0)¶

Actor Critic Algorithm. Optimizes the policy

function approximator using policy gradient.

Parameters:	env – OpenAI environment. month_var – index of month battery_var – current battery SoC estimator_policy – Policy Function to be optimized estimator_value – Value function approximator, used as a critic num_episodes – Number of episodes to run for discount_factor – Time-discount factor
Returns:	An EpisodeStats object with two numpy arrays for episode_lengths and episode_rewards.

RL.solar_a2c_nonlinear.featurize_state(state)¶

RBF feature representation of a given state

Parameters:	state – current state
Returns:	state with new feature representation

Environment¶

Environment Formulate the problem into MDP environment, contains features of reset initialize states, calulate reward of a given states, and simulate next states.

class RL.environment.EnergyEnvironment(mode='ground_truth', charge_mode='TOU', payment_cycle=24, datafile='test')¶

Bases: object

check_valid_action(action)¶

Check if the current action is a valid action given the constraints

Parameters:	action – action of the current time slot
Returns:	Binary value True or False

init_ground_truth()¶

Initialize the ground truth into states

Read solar generation data, house load related data and them into a 2d array.

Returns:	None

init_price()¶

Initial the price table, read input file

Read row by row and save as an array

Returns:	None

reset()¶

Initialise the state,

Rest current state to the end of last episode

Returns:	None

step(action)¶

The main function of MDP environment, it read in a action,

Calculate the reward of the this action, return the next state

Return total_need_grid:
Parameters:	action – The action the current state
	total power take from the gird return_state: the next state Binary Value: If the episode terminate

Baseline¶

Contains different rule based baselines for different scenarios.

No solar panel¶

Get no solar Get baseline strategy for no solar and no batery installed.

Baseline.get_nosolar.eval()¶

Evaluate the policy if it is already learned

Returns:	None

Baseline.get_nosolar.pre_train(episodes)¶

Get the baseline if there is no solar panel installed

Parameters:	episodes – Number of episodes to be run
Returns:	None

No battery¶

Get no battery Get baseline strategy for no battery installed and with solar installed.

Baseline.get_nostorage.eval()¶

Evaluate the policy if it is already learned

Returns:	None

Baseline.get_nostorage.pre_train(episodes)¶

Learn a policy if there is solar panel installed but no battery installed Sell solar power to the grid if there is reminder after home use

Parameters:	episodes – Number of episodes to be run
Returns:	None

Rule based policy¶

Get RBC Get a baseline strategy based on rule based TOU price.

Baseline.get_rbc.pre_train(episodes)¶

learn the rule based policy for a year charge at max rate when solar tariff is low discharge at max rate when solar tariff is high

Parameters:	episodes – Number of episodes to be run
Returns:	None

These three implementations will call assocaited environments developed for these different scenarios.

DLC¶

Contains implementation of direct mapping method.

Mapping¶

Mapping Mapping from avaiable features of each time slots to optimal actions solved by the solver and oracle data.

Training neural network¶

DLC.tune.create_model(x_train, y_train, x_test, y_test)¶

Model providing function: Create Keras model with double curly brackets dropped-in as needed.

Parameters:	x_train – train set y_train – train target x_test – test set y_test – test target
Returns:	Return value has to be a valid python dictionary with two customary keys: - loss: Specify a numeric evaluation metric to be minimized - status: Just use STATUS_OK and see hyperopt documentation if not feasible - The last one is optional, though recommended, namely: - model: specify the model just created so that we can later use it again.

DLC.tune.data()¶

Data providing function:

This function is separated from create_model() so that hyperopt won’t reload data for each evaluation run.

Returns:	None