API documentation¶
MPC¶
This part contains implementation of model predictive control.
Main code¶
The main code for MPC uses cvxpy and Gurobi solver to solve the problem as a convex optimization problem, several versions of the MPC code are used for solve problem in different scenarios like hourly price and Net metering.
-
MPC.MPC2.
cleanup
(val)¶ Round up little number to 0
Parameters: val – learned policy Returns: None
-
MPC.MPC2.
compute
(hour_var, battery_var, last)¶ Get the best policy of next day
Parameters: - hour_var – index of hour of a year
- battery_var – current battery SoC
- last – binary value, if it is the last episode
Returns: None
-
MPC.MPC2.
init_ground_truth
(datafile)¶ Initialise house hold data, solar data and price
Parameters: datafile – path input file Returns: None
-
MPC.MPC2.
predict_day
(start)¶ load new observations,predict home use and solar generate of the next time slot
Parameters: start – index of hour of a year Returns: use_list: predicted home use of next ac list: predicted ac output of next day
Create hourly price¶
Create hourly price changes unix time to UTC time and generate a hourly price table
-
MPC.create_hourly_price_table.
create
()¶ create hourly price table, read the the raw data with unix time convert to utc time with proper time zone
Returns: None
Create TOU price¶
Create TOU price table creates a TOU price table for a year, so it can be used loaded as matrix in the solver.
RL¶
Implementation of Reinforcement Learning method
A2C¶
A2C Implementation of Actor Critic method which will genetrate the stratgy given states and reward.
-
class
RL.solar_a2c_nonlinear.
PolicyEstimator
(learning_rate=0.01, scope='policy_estimator')¶ Bases:
object
Policy Function approximator.
-
predict
(state, sess=None)¶ predict the action given a state
Parameters: - state – current state
- sess – tensorflow session
Returns: Updated session
-
update
(state, target, action, sess=None)¶ Update the policy function
Parameters: - state – current state
- target – td target
- action – learned action
- sess – tensorflow session
Returns: loss
-
-
class
RL.solar_a2c_nonlinear.
ValueEstimator
(learning_rate=0.1, scope='value_estimator')¶ Bases:
object
Value Function approximator.
-
predict
(state, sess=None)¶ predict the value given a state
Parameters: - state – current state
- sess – tensorflow session
Returns: Updated session
-
update
(state, target, sess=None)¶ Update the policy function
Parameters: - state – current state
- target – td target
- action – learned action
- sess – tensorflow session
Returns: loss
-
-
RL.solar_a2c_nonlinear.
actor_critic
(env, month_var, battery_var, estimator_policy, estimator_value, num_episodes, discount_factor=1.0)¶ Actor Critic Algorithm. Optimizes the policy
function approximator using policy gradient.
Parameters: - env – OpenAI environment.
- month_var – index of month
- battery_var – current battery SoC
- estimator_policy – Policy Function to be optimized
- estimator_value – Value function approximator, used as a critic
- num_episodes – Number of episodes to run for
- discount_factor – Time-discount factor
Returns: An EpisodeStats object with two numpy arrays for episode_lengths and episode_rewards.
-
RL.solar_a2c_nonlinear.
featurize_state
(state)¶ RBF feature representation of a given state
Parameters: state – current state Returns: state with new feature representation
Environment¶
Environment Formulate the problem into MDP environment, contains features of reset initialize states, calulate reward of a given states, and simulate next states.
-
class
RL.environment.
EnergyEnvironment
(mode='ground_truth', charge_mode='TOU', payment_cycle=24, datafile='test')¶ Bases:
object
-
check_valid_action
(action)¶ Check if the current action is a valid action given the constraints
Parameters: action – action of the current time slot Returns: Binary value True or False
-
init_ground_truth
()¶ Initialize the ground truth into states
Read solar generation data, house load related data and them into a 2d array.
Returns: None
-
init_price
()¶ Initial the price table, read input file
Read row by row and save as an array
Returns: None
-
reset
()¶ Initialise the state,
Rest current state to the end of last episode
Returns: None
-
step
(action)¶ The main function of MDP environment, it read in a action,
Calculate the reward of the this action, return the next state
Parameters: action – The action the current state Return total_need_grid: total power take from the gird return_state: the next state Binary Value: If the episode terminate
-
Baseline¶
Contains different rule based baselines for different scenarios.
No solar panel¶
Get no solar Get baseline strategy for no solar and no batery installed.
-
Baseline.get_nosolar.
eval
()¶ Evaluate the policy if it is already learned
Returns: None
-
Baseline.get_nosolar.
pre_train
(episodes)¶ Get the baseline if there is no solar panel installed
Parameters: episodes – Number of episodes to be run Returns: None
No battery¶
Get no battery Get baseline strategy for no battery installed and with solar installed.
-
Baseline.get_nostorage.
eval
()¶ Evaluate the policy if it is already learned
Returns: None
-
Baseline.get_nostorage.
pre_train
(episodes)¶ Learn a policy if there is solar panel installed but no battery installed Sell solar power to the grid if there is reminder after home use
Parameters: episodes – Number of episodes to be run Returns: None
Rule based policy¶
Get RBC Get a baseline strategy based on rule based TOU price.
-
Baseline.get_rbc.
pre_train
(episodes)¶ learn the rule based policy for a year charge at max rate when solar tariff is low discharge at max rate when solar tariff is high
Parameters: episodes – Number of episodes to be run Returns: None
These three implementations will call assocaited environments developed for these different scenarios.
DLC¶
Contains implementation of direct mapping method.
Mapping¶
Mapping Mapping from avaiable features of each time slots to optimal actions solved by the solver and oracle data.
Training neural network¶
-
DLC.tune.
create_model
(x_train, y_train, x_test, y_test)¶ Model providing function: Create Keras model with double curly brackets dropped-in as needed.
Parameters: - x_train – train set
- y_train – train target
- x_test – test set
- y_test – test target
Returns: Return value has to be a valid python dictionary with two customary keys: - loss: Specify a numeric evaluation metric to be minimized - status: Just use STATUS_OK and see hyperopt documentation if not feasible - The last one is optional, though recommended, namely: - model: specify the model just created so that we can later use it again.
-
DLC.tune.
data
()¶ Data providing function:
This function is separated from create_model() so that hyperopt won’t reload data for each evaluation run.
Returns: None