$\Large \displaystyle x_i=x_i(q_1,q_2,...q_n)$
$\Large \displaystyle \dot{x_i}=\sum_{j=1}^n \frac{\partial x_i}{\partial q_j}\dot{q_j} \ \ \ \ (i=1,2,..,n)$
$\Large \displaystyle p_i=\frac{\partial T}{\partial \dot{q_i}}$
$\Large \displaystyle L=T-U$
$\Large \displaystyle p_i=\frac{\partial L}{\partial \dot{q_i}}$
$\Large \displaystyle F_i=\frac{\partial L}{\partial q_i}$
$\Large \displaystyle \frac{d}{dt}\left (\frac{\partial L}{\partial \dot{q_i}}\right)=\frac{\partial L}{\partial q_i}$
|
→ |
|
$\Large \displaystyle bel(x_t)=\eta\int_Xp(x_t|u_t,x_{t-1}bel(x_{t-1}))dx_{x-1}$
$\Large \displaystyle bel(x|y)=\eta p(y|x)bel(x)$
$\Large \displaystyle V(x_t)=\gamma \max_{u_t}[r(x_{t-1},u_t) + \int V(x_{t-1})p(x^{'}|u_t, x_{t-1})dx^{'}]$
「強化学習に出てくるベルマン方程式を理解しよう」から引用。
$\Large \displaystyle \dot{x}=Ax+Bu$
$\Large \displaystyle J=\int_0^\infty((x_{ref}−x)^TQ(x_{ref}−x)+u^TRu)dt$
$\Large \displaystyle J=\int_0^\infty l(x,u,t)dt$
$\Large \displaystyle J_t=\int_t^\infty l(x,u,t)dt$
$\Large \displaystyle V_t=\min_u J_t$
と書き直し、これを 価値関数 と呼ぶことにします。$\Large \displaystyle V_0=\min_u J_0=\min_u J$
となっており、時刻「t」における価値関数はもともとの評価関数の最小値そのものであるということになります。$\Large \displaystyle V(x(t),t)=\min_u J(x(t),u(t),t)$
$\Large \displaystyle V(x(t),t)=\min_u J(x(t),u(t),t)$
$\Large \displaystyle =\min_u \int_t^\infty l(x,u,t)dt$
$\Large \displaystyle =\min_u \int_t^{t+dt} l(x,u,t)dt+\min_u \int_{t+dt}^\infty l(x,u,t)dt$
$\Large \displaystyle =\min_u \int_t^{t+dt} l(x,u,t)dt+V(x(t+dt),t+dt)$
$\Large \displaystyle V(x(t),t)=\min_u \int_{t+dt}^t l(x,u,t)dt+V(x(t+dt),t+dt)$
$\Large \displaystyle V(x(t),t)=\min_u l(x(t),u(t),t)+V(x(t+1),t+1)$
$\Large \displaystyle V(x_t)=\min_u l(x_t,u_t)+V(x_{t+1})$
$\Large \displaystyle \dot{x}=Ax+Bu$
$\Large \displaystyle \dot{x}=f(x,u)$
$\Large \displaystyle x_{t+1}=g(x_t,u_t)$
$\Large \displaystyle x_{t+1}\sim p(x_{t+1}∣x_t,u_t)$
$\Large \displaystyle V(x_t)=\max_u R(x_t,u_t)+\gamma \max_u \sum_{x_t}p(x_{t+1}∣x,u)V(x_{t+1})$
$\Large \displaystyle V(x_t)=\min_u l(x_t,u_t)+V(x_{t+1})$
$\Large \displaystyle V(x_t)=\min_u l(x_t,u_t)+V(x_{t+1})$
$\Large \displaystyle V(x_t)=\max_u l(x_t,u_t)+V(x_{t+1})$
$\Large \displaystyle V(x_t)=\max_u l(x_t,u_t)+\gamma V(x_{t+1})$
$\Large \displaystyle x_{t+1}\sim p(x_{t+1}∣x_t,u_t)$
$\Large \displaystyle V(x_t)=\max_u l(x_t,u_t)+\gamma \sum_{x_{t+1}}p(x_{t+1}∣x,u)V(x_{t+1})$
$\Large \displaystyle V(x_t)=\max_u R(x_t,u_t)+\gamma \max_u \sum_{x_{t+1}}p(x_{t+1}∣x,u)V(x_{t+1})$
まずは最適制御理論の基礎となる式を導出しましょう。
$\Large \displaystyle \frac{dy}{dx}=f(x,u,t)$
$\Large \displaystyle J[x(t),u(t)]=\varphi(x(t_f)) + \int_{t_0}^{t_f}L(x(t),u(t),t)$
$\Large \displaystyle J[x(t),u(t)]=\varphi(x(t_f)) + \int_{t_0}^{t_f}L(x(t),u(t),t)$
$\Large \displaystyle =\varphi(x(t_f)) + \int_{t_0}^{t_f}L(x(t),u(t),t)$
$\Large \displaystyle P(S|X)=\frac{P(X|S)P(S)}{\int P(X|S)P(S)dS}$
$\Large \displaystyle P(X_{t+1}|X_t,..,X_1)=P(X_{t+1}|X_t)$
$\Large \displaystyle r=\frac{f(\theta(a))}{f(\theta(t))}$
メトロポリス法のサンプリングのイメージ図
「【ChatGPT】OpenAI APIの試用期間後の課金方法!移行手順を解説」より