C++のLINQライクライブラリとC#のLINQ覚書き

概要

LINQを使うに当たって、評価コストやメモリ使用量が気になったので自分用メモ
C#のLINQを基準にして考え、cpplinq、Boost.RangeのOven拡張の使い方、特性について書いていきます。

C#のLINQ

クエリはIEnumerableな配列として作成されます、
作成時にはクエリ式は評価されず、ループやCount()の時に評価されます。

var lst = new [] {1, 2, 3, 4, 5, 6};

int query_counter = 0;
var q = lst
	.Where(x => {
		query_counter++;
		Console.Write(" a:" + x);
		return x % 2 == 0;
	})
	.Where(x => {
		query_counter++;
		Console.Write(" b:" + x);
		return x <= 4;
	})
	.Select(x => {
		query_counter++;
		Console.Write(" c:" + x);
		return x * 2;
	})
	;
Console.WriteLine("maked query");

int loop_counter = 0;
foreach (var x in q) loop_counter++;
Console.WriteLine();

Console.WriteLine("query_counter:" + query_counter + " loop_counter:" + loop_counter);

結果

maked query
 a:1 a:2 b:2 c:2 a:3 a:4 b:4 c:4 a:5 a:6 b:6
query_counter:11 loop_counter:2

C++のcpplinqライブラリ

各クエリ*1はbase_rangeを継承した各コンテナクラスで表現されます、
内部では配列のiteratorをコンテナへ格納し、クエリ式毎に各クエリ用のコンテナへと評価用のファンクタと共に格納していきます。

スーパークラスのクラス図

※サブクラスはbuilderによって作成され、スーパークラスのrangeとファンクタをメンバに持ってます。

ちなみにcpplinqではbase_rangeがiteratorを持っていない為か、begin、endのメソッドが実装されていない為range-based forが使えず、
ループの時にはfromとnextを使う必要があります。

using namespace std;
using namespace cpplinq;
	
auto v = {1, 2, 3, 4, 5, 6};

using namespace cpplinq;
int query_counter = 0;
auto q = from(v)
	>> where ([&query_counter](const int &x) {
		query_counter++;
		cout << " a:" << x;
		return x % 2 == 0;
	})
	>> where ([&query_counter](const int &x) {
		query_counter++;
		cout << " b:" << x;
		return x <= 4;
	})
	>> select([&query_counter](const int &x) {
		query_counter++;
		cout << " c:" << x;
		return x * 2;
	})
	;
cout << endl << "maked query" << endl;

int loop_counter = 0;
while (q.next()) { const auto &x = q.front(); loop_counter++; }
// ※ここでqの役目は終わる
cout << endl;

cout << "query_counter:" << query_counter << " loop_counter:" << loop_counter << endl;

結果

maked query
 a:1 a:2 b:2 c:2 a:3 a:4 b:4 c:4 a:5 a:6 b:6
query_counter:11 loop_counter:2

C++ のBoost.RangeのOven拡張

各クエリ*2はboost_iterator_rangeを継承した各コンテナクラスで表現されます。

スーパークラスのクラス図

※サブクラスはスーパークラスによって作成され、スーパークラスのiteratorとファンクタをメンバに持ってます。

こっちはcpplinqと違い配列に対してそのまま適用出来、
他とは違い、事前に評価が可能な所は先に評価が実行されます。

auto v = {1, 2, 3, 4, 5, 6};

using namespace boost::adaptors;
using namespace std;

int query_counter = 0;
auto q = v
	| filtered([&query_counter](const int &x) {
		query_counter++;
		cout << " a:" << x;
		return x % 2 == 0;
	})
	| filtered([&query_counter](const int &x) {
		query_counter++;
		cout << " b:" << x;
		return x <= 4;
	})
	| transformed([&query_counter](const int &x) { // 1.54.0からtransformedにもラムダ式使える、素敵
		query_counter++;
		cout << " c:" << x;
		return x * 2;
	})
	;
cout << endl << "maked query" << endl;

int loop_counter = 0;
for (const auto &x : q) loop_counter++;
cout << endl;

cout << "query_counter:" << query_counter << " loop_counter:" << loop_counter << endl;

結果

 a:1 a:2 b:2
maked query
 c:2 a:3 a:4 b:4 c:4 a:5 a:6 b:6
query_counter:11 loop_counter:2

まとめ

・C++ではBoost.RangeのOven拡張が定番。
・コンパイル時間が気になるならcpplinq*3。
・パフォーマンスとメモリ使用量は、どちらも内部でiterator *4進めてファンクタで評価してるだけなので大差ないはず*5。
・書き方の違いはどちらもLINQライクなので結局は慣れ。

*1:文脈上こう呼びます、厳密には違う

*2:文脈上ry

*3:Ovenでも無視できる程度だと思う

*4:cpplinqは厳密には違うけど

*5:測定したら、最適化なしはcpplinqが二倍程速く、-O2で最適化すると差が無かった