I agree with the author, we need better primitives, if you need functionality now:
Major tools that exist today for partial structure traversal and focused manipulation:
- Optics (Lenses, Prisms, Traversals)
Elegant, composable ways to zoom into, modify, and rebuild structures.
Examples: Haskell's `lens`, Scala's Monocle, Clojure's Specter.
Think of these as programmable accessors and updaters.
- Zippers
Data structures with a "focused cursor" that allow local edits without manually traversing the whole structure.
Examples: Huet’s original Zipper (1997), Haskell’s `Data.Tree.Zipper`, Clojure’s built-in zippers.
- Query Languages (for semantic traversal and deep search)
When paths aren't enough and you need semantic conditionals:
- SPARQL (semantic web graph querying)
- Datalog (logic programming and query over facts)
- Cypher (graph traversal in Neo4j)
- Prolog (pure logic exploration)
These approaches let you declaratively state what you want instead of manually specifying traversal steps.
Traversable and lenses are very closely linked. If you go to the original paper leading to Traversable [1] and read through it, it feels basically identical to reading through the parts of the lens library that lay down the core abstractions and the laws implementations must follow if you want to be able to blindly manipulate them. In fact, the traverse function is a Traversal, and so fits trivially into the lens ecosystem.
Arent haskell traversables different in that they preserve the structure if you were to map over them, as compared to the solution posted in the article, where they get flattened to what amounts to a list?
jasperry 31 days ago [-]
These are what I think the author is looking for. But it shouldn't be a "primitive" in terms of code automatically generated by the compiler, but an interface or typeclass like your examples (in a language advanced enough to have them.)
The problem is that 'lens', 'monocle', etc. are famously abstract and difficult for people to apply to their actual problems. IMO, the solution would be for standard libraries to specify interfaces called 'BreadthFirstTraverse', 'DepthFirstTraverse', etc.
T-R 31 days ago [-]
I definitely agree for traversals, but Lenses need some sort of primitive support - even in Haskell they're mostly generated with TemplateHaskell, and the language developers have spent a long time trying to make the `record.field` accessor syntax overloadable enough to work with lenses[1][2]. Hopefully someday we'll be free from having to memorize all the lens operators.
Optics are famously abstract in implementation, but I don't think people have trouble applying them - people seem to like JQuery/CSS selectors, and insist on `object.field` syntax; it's kind of wild that no mainstream language has a first-class way to pass around the description of a location in an arbitrary data structure.
Optics let you concisely describe the location, but defer the dereferencing, so you could definitely approximate optics, not by passing around pointers you compute with `offsetof`, but passing around functions that use `offsetof` to return memory locations to reference (read/write to). You could certainly write a composition operator for `*(*T) => List<*R>`... Some people have done something like it[1][2]:
Account acc = getAccount();
QVERIFY(acc.person.address.house == 20);
auto houseLens = personL() to addressL() to houseL();
std::function<int(int)> modifier = [](int old) { return old + 6; };
Account newAcc2 = over(houseLens, newAcc1, modifier);
These also use templating to get something that still feels maybe a little less ergonomic than it could be, though.
Haskell has 'Traversable' (and 'Foldable' etc) which are a lot more approachable than the fully generalised lens library.
naasking 31 days ago [-]
> These are what I think the author is looking for. But it shouldn't be a "primitive" in terms of code automatically generated by the compiler
I think people are often too enamored by general purpose languages that can express such abstractions natively. I don't see an issue with a language that provides this as a primitive without being able to express it itself, constraints can be useful for other properties. Once you can traverse trees, most programming problems can be tackled even in such constrained languages, eg. SQL with CTE.
Avshalom 31 days ago [-]
To point out a prolog thing which is also applicable to other languages with good patter matching: the break/return/prune examples are all ergonomic to implement as recursion in a way that fails in C++ style type based dispatch.
rebeccaskinner 31 days ago [-]
Don’t forget recursion schemes. The last example in the article was just asking for a hylomorphism.
moomin 30 days ago [-]
Honestly when I read the article I was, "Do you have a minute to talk about our Lord and Saviour, derive (Functor) and how her son hylo saved us all?"
jll29 30 days ago [-]
In Rust, you can extend the syntax a bit within the language itself.
For C++, you could define yourself a template that expands to the two functions you listed.
For any language, you could write yourself a pre-processor that adds for_tree notation and expands it,
either with pre-processor semantics or working on the abstract syntax tree (which is more "proper" but also more work"). I would recommend the former to test notations, and then you can live with them for a while to experiment with them and see if and how often you really need the construct (recall that is how C++ evolved from C via "C with classes" - it was first a pre-processor).
Once you and others are 100% convinced of the new syntax, go to your prefered language's working group/ISO committee and propose inclusion.
My own feeling is that this is not something for which I need more than recursion; calling inside traverse() traverse(left) and traverse(right) for binary trees or using a normal for loop to iterate over all this->children() from 0 to this->children.size() is something that occurs in graph libraries and parsers/compilers once in a while but not often enough to warrant its own notation. Rather, when I look at languages like C++, I'd have a language that is simpler, cleaner, more homogeneous and more orthogonal; C++ is complicated, convoluted, too large to implement it correctly by a single person in reasonable time (compared to beauties like C, Pascal, Scheme), so I stand on the side of minimalism.
timeninja 30 days ago [-]
The "syntax extension" thing is precisely the reason I have an interest in (and advocacy for) Common Lisp.
Be that as it may, for C++, Eric Neibler's [Boost.Proto](https://www.boost.org/doc/libs/1_84_0/doc/html/proto.html) could likely help conveniently connect syntax to implementation to achieve something similar to what the author is taking about.
postepowanieadm 31 days ago [-]
SQL recursive CTE maybe?
adamgordonbell 31 days ago [-]
Even just traverse from scala, haskell, kotlin, etc will work iterating over trees.
larodi 31 days ago [-]
how about we also get regex-parsable streams (IO::Async in perl has something like it, suboptimal perhaps) and regex-parsable treestructures (totally possible)? seems like just having the ~= work on structures (or whatever the API is called in other languages, this being Perl5)?
melagonster 29 days ago [-]
Does this mean something like $tree_object =~ m/regex/ will work and return a sub tree?
larodi 28 days ago [-]
Yeah and if you think about it is totally doable. The FSM or whatever automata is just picks and exhausts branches that provide the correct value.
Definitely can be done to streaming data… protobufs in a way is this, given it’s a sort of BNF from a bird’s eye.
Seriously, I thought of those the moment I read the title of this post.
CyberDildonics 31 days ago [-]
This seems like dramatically over complicated iterators. Do they really need to be called 'optics' and 'lenses' and 'prisms' ?
Think of these as programmable accessors and updaters.
How is iterating through something already not 'programmable' ?
bts 31 days ago [-]
Indeed they call for new names, as they encompass far more than iterators.
If you read a bit more about them, I think you will be surprised to see the breadth of what these abstractions can be used for. To start, they've been used to build a new compositional foundation for game theory[1], and they can be used to model gradient-based learning in machine learning[2].
As for their simpler applications as getters and setters, they are programmable in the sense that you can think of lens/prism types as interfaces that can be implemented arbitrarily. So you can create your own lenses and combine them like legos to construct new, "bigger" getters and setters from the smaller components.
This thread is about traversing a tree. At what point do we take a step back and think that iterating through a data structure and "building new compositional foundations for game theory" shouldn't be conflated together?
When does someone give up on the pagentry of programming and just get something done by looping through data instead of "constructing getters and setters to model gradient based machine learning".
It really seems like the straight forward way of doing things is to make simple iteration and worry about the game theory after that is all figured out.
throwaway17_17 31 days ago [-]
I do get, and to some extent sympathize with, your position. But the comments to the article are, in large part, fundamentally addressing a higher level of abstraction than the more narrow scope of the article’s premise. As several commenters have referenced, traversing a tree in a ‘simple’ and narrow manner is addressed at the appropriate level of abstraction using already established mechanisms, particularly in Haskell using Traversable (a typeclass with a standardized implementation for trees). The discussion of Optics and Lenses are more of a side discussion about the syntax and implementations of a more broad set of data structures than merely trees.
I think your comment is valuable in spirit because it could bring a more thorough discussion of the creation or popularization of a syntactically clean and a shift of the idiomatic-ness of such a solution to traversing trees. Leaving the more abstract and general Optics discussion to be a side dish for commenters to discuss as well.
Also, just a nitpick, but traversing a tree is a broader topic than iteration. Iteration implies, both colloquially and formally, performing a computation on each element of a data structure, while traversal is a more general algorithm that could be the basis of iteration, but may also be more akin to a search through the data structure without expectation of performing computation until the searched for ‘member’ is returned from the traversal. So it’s not an apples-to-oranges comparison with your conflation of iteration and traversal, but does seem a little like Canis Familiaris-to-Mammal comparison.
CyberDildonics 30 days ago [-]
I read your whole comment but I'm still not getting where the actual rubber meets the road. If I have a tree data structure what am I missing? What is it that I can't do that makes this extreme iteration complexity worthwhile? To me, having simple access to the data structure is enough because that's what I want a data structure for.
9. It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.
"Let's move on."
nimih 31 days ago [-]
In addition, clojure.core has the handy tree-seq function:
(defn tree-seq
"Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
branch? must be a fn of one arg that returns true if passed a node
that can have children (but may not). children must be a fn of one
arg that returns a sequence of the children. Will only be called on
nodes for which branch? returns true. Root is the root node of the
tree."
{:added "1.0"
:static true}
[branch? children root]
(let [walk (fn walk [node]
(lazy-seq
(cons node
(when (branch? node)
(mapcat walk (children node))))))]
(walk root)))
Xmd5a 31 days ago [-]
(defn tree-seq-breadth
"Like tree-seq, but in breadth-first order"
[branch? children root]
(let [walk (fn walk [node]
(when (branch? node)
(let [cs (children node)]
(lazy-cat cs (mapcat walk cs)))))]
(cons root (walk root))))
Didn't Wirth(may be it was someone else) say that it is better to have a complex data structure and simple algorithm/code that works on them than having simple data structures and complex code.
Complex data structures absorb lot of the complexity of the problem and reduce the complexity of the rest of the code.
Vosporos 30 days ago [-]
Linus Torvalds also spoke in favour of using adequate data structures instead of the code.
parrit 31 days ago [-]
It is even better to have 100 functions work on 100 data structures. Powerful programming languages like Lisp and Haskell give you that. Generics gives you most of that.
If every ounce of performance matters, e.g. in a database, you want 10000 functions, 100 for each data structure.
MathMonkeyMan 31 days ago [-]
Lisp gives you an infinite amount of functions that operate on one data structure: the cons cell. :P
I guess all you really need are dynamically allocated arrays. A cons cell is an array of two. A struct with N fields is an array of N. Everything else is built on that.
pgt 31 days ago [-]
Nathan Marz' talk on Specter (Clojure Library that decouples navigation from transformation) is must-watch if you deal with data: https://www.youtube.com/watch?v=VTCy_DkAJGk
I use it in every project for data navigation and transformation, and it's more performant than standard Clojure data manipulation, while retaining types (instead of coercing back from seqs).
E.g. if you have a map and you want to increment every value in the map:
(require '[com.rpl.specter :as S])
Now let's say you have a map of vectors and want to increment all of those?
(->> {:a 5, :b 6, :c 7}
(S/transform [S/MAP-VALS S/ALL] inc)) ;; note the navigator juts got another level of nesting
=> {:a [2 3], :b [4 5], :c [6 7]}works for all clj data types, and of course it has navigators for recursive walking .
It took me a while to get good at Specter, but it was worth it. I hear Rama uses Specter navigators internally.
erichocean 30 days ago [-]
Meander is also useful if you need to map from one structure to another, and Malli if you need to write coercisions that check the validity of structures at runtime (aka "schema checking").
That's "just" a particular kind of fancy iterator that you should be able to implement in any language with iterator support. Here's one in Python:
# Expects:
# Tuple of iterateOn where iterateOn can be None
def fancyIt(init, *options):
if init != None:
yield(init)
for f in options:
newVal = f(init)
yield from fancyIt(newVal, *options)
class Tree:
def __init__(self, val, left = None, right = None):
self.val = val
self.left = left
self.right = right
def left(self):
return self.left
def right(self):
return self.right
myTree = Tree(
1,
Tree(2,
Tree(3),
Tree(4, None, Tree(5))),
Tree(6, None, Tree(7)))
for node in fancyIt(myTree, Tree.left, Tree.right):
print(node.val)
which prints the numbers 1 through 7 in order.
Breadth-first is slightly trickier, but only slightly trickier one time.
imglorp 31 days ago [-]
Yes, it seems easy to implement.
Yes, students should absolutely implement the classic algorithms to learn.
Yes, there are some occasions when you need to home grow one at $work.
BUT, in my opinion, most of the time, professional code should use a battle tested, vuln hardened library or builtin version. These things are VERY HARD to get exactly right. Jon Bently's Programming Pearls famously had a latent bug in its binary search for 20 years before someone caught it.
So yeah, it looks easy but don't do it. Stand on some giant's shoulders instead.
jerf 31 days ago [-]
Your reply is not relevant to my reply. The original poster is asking for this functionality and appears to believe it is something other than an iterator and requires some sort of special language support. However, it is completely implementable as an iterator, in a reasonably usable manner, with no additional language support. My specific code is written only to show that fact off.
Anyone who copies and pastes it is welcome to both pieces when it breaks. Others have already alluded to possible improvements that could be made, and I already have my own analysis in a grandchild reply as to why I don't think this is a terribly pressing need or necessarily even a good idea.
The reason I provide code is that it gets past the "oh, you say it's just an iterator, but I still don't believe you, since you haven't spelled it out to the n'th degree". When code is provided, belief ceases to be an issue. It is clearly something an iterator can implement, in existing languages, with existing iterator support.
Unless you're going to claim it is somehow impossible to provide this functionality in a tested manner, you're completely changing the topic in an uninteresting direction, since it is always true that functionality generally needs testing and bits of code slammed into an HN conversation just to make a particular point probably shouldn't be copied wholesale into your production code.
Amadiro 31 days ago [-]
Sure but all pre-made, battle-tested tree datastructures you'd use in production in all languages already come with some form of iterator that you can just for-loop over, so the original articles point is still moot.
thayne 31 days ago [-]
Sure, but it can be done by a library. There's no reason it needs to be built into the language.
packetlost 31 days ago [-]
Yeah, tree traversal is really easy and implementing it as an iterator is natural. Maybe don't use a recursive technique if you plan on working with non-toy datasets, Python's default stack limit is pretty small (1000), but this is otherwise a very flexible API.
While easy, I think bisect would be a good addition to every stdlib too.
jerf 31 days ago [-]
I did the tree just to match the author's example. I would agree that a bespoke iterator for breadth- and depth-first iteration for any given tree is probably a better way to go. As long as we're in a language like Python, build in something that allows you to examine a branch and decline to descend into it while you're at it.
I don't think this is a large problem in practice because you shouldn't be using dozens of tree types in a given code base, so adding iterators to a tree is no big deal. In general there aren't enough types of iteration available to a given data structure that you need to describe how to iterate on it from the "outside". (Generally when you are doing that, it's too non-trivial to fit into this pattern anyhow; see the Visitor pattern in general.) This strikes me as maybe the sort of default tool you might slap in a library somewhere, but it should be a niche tool. If you're using it all the time you're probably doing something wrong. By default your data structures should be providing iteration packaged with them and it should generally be what you need. And your language should support aborting iteration, in whatever that looks like normally. I'm not sure I know a language that doesn't, it's a fairly basic element of iterator support when you get into implementation.
There are also many cases where a tree iterator will perform significantly better, including CPython. I don't have enough experience with PyPy to know if it could inline the Tree.left and Tree.right calls down to zero penalty at JIT time. Rust and C++ and the other static languages with sophisticated compilers might be able to get that down to fully inlined and zero-cost, but even if they can it's probably better not to push that on to the optimizer as the optimizers will eventually give up if this is composed with enough other stuff. Better to just have an efficient implementation in the first place.
Well, the whole point of the blog post is to argue for a new kind of syntactic sugar and the author explicitly mentions iterators.
> Well a range based for loop requires that your tree exist in memory AND that you have an iterator defined for your tree. With for_tree you could operate on an entirely imperative tree, without needing to define any iterators or generator functions. Here's an example where I'm checking every single string composed of "a", "b", and "c" of length 8 or less.
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
}
You could definitely find every string composed of "a", "b", and "c" of length 8 or less by defining a custom iterator but it would be a verbose and unpleasant way of writing it:
class StringIterator {
public:
using iterator_category = std::forward_iterator_tag;
using value_type = std::string;
using difference_type = std::ptrdiff_t;
using pointer = const std::string*;
using reference = const std::string&;
StringIterator(bool begin = false) : is_end_(!begin) { if (begin) s_ = ""; }
const std::string& operator*() const {
if (is_end_) throw std::out_of_range("End iterator");
return s_;
}
StringIterator& operator++() {
if (is_end_) return *this;
if (s_.size() < 8) return s_.push_back('a'), *this;
while (!s_.empty() && s_.back() == 'c') s_.pop_back();
if (s_.empty()) is_end_ = true;
else s_.back() = s_.back() == 'a' ? 'b' : 'c';
return *this;
}
StringIterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }
bool operator==(const StringIterator& other) const {
return is_end_ == other.is_end_ && (is_end_ || s_ == other.s_);
}
bool operator!=(const StringIterator& other) const { return !(*this == other); }
private:
std::string s_;
bool is_end_;
};
int main() {
StringIterator begin(true), end;
int count = 0;
for (auto it = begin; it != end; ++it) ++count;
std::cout << (count == 9841 ? "Pass" : "Fail") << std::endl;
return 0;
}
jerf 31 days ago [-]
def itWithStop(init, stop, *options):
if init is not None and not stop(init):
yield(init)
for f in options:
newVal = f(init)
yield from itWithStop(newVal, stop, *options)
for s in itWithStop("",
lambda x: len(x) > 2,
lambda x: x + "a",
lambda x: x + "b",
lambda x: x + "c"):
print(s)
yields the combinations of 0 - 2 length strings with a, b, and c.
Python has a number of ways to achieve this depending on exactly how you want to pass the arguments; multiple functions, optional arguments, etc. How nice the final call looks is more about your local language's closures look.
The main point here is that this will happily iterate on things that don't "exist".
module Tmp where
iter :: forall a. (a -> Bool) -> [a -> a] -> a -> [a]
iter p opts x = if p x then x:concatMap (iter p opts) (opts <*> [x]) else []
ghci> :l tmp.hs
[1 of 1] Compiling Tmp ( tmp.hs, interpreted )
Ok, one module loaded.
ghci> iter (\x -> length x < 3) [(++ "a"), (++ "b"), (++ "c")] ""
["","a","aa","ab","ac","b","ba","bb","bc",
"c","ca","cb","cc"]
(Since things are lazy in Haskell, functions that return lists effectively are iterators. There's probably something in the standard library somewhere for (opts <*> [x]) to avoid the wrapping x in an unnecessary list, but my Haskell is rusty.)
porphyra 31 days ago [-]
The iterator in my example will also happily iterate on things that don't exist. We all agree that it's possible to do in any language. But the main point is that the blog post is talking about syntactic sugar for an easier way to do it.
And yes, Haskell is amazing at this sort of thing.
jerf 31 days ago [-]
The syntactic sugar being asked for in this case is awfully thin. I definitely put this in the class of "stop pining for it and just use what's there".
If the poster wants to particularize this to C++ because C++'s syntax can't support it in any reasonable manner, that's fine, but that's a C++ problem, not a "Programming languages..." problem. Which would be perfectly understandable and I'm not really complaining, more clarifying that most of the rest of the world can just rub together three or four existing constructs in a pretty reasonable manner to get this.
samus 31 days ago [-]
Same thing, but the iterator is instead hidden inside the language implementation. `foreach` is already a quite general construct, and iterators are the way to extend them. However, I can see the benefit of designing a library to help implement more intricate iterators.
Ceterum censeo this would be a family of simple macros in LISP.
abbeyj 31 days ago [-]
It seems like it should be possible to factor out most of the iterator boilerplate into a helper class. Then each place where you want to iterate you can construct an instance of that helper class and supply a lambda that specifies how to descend into children. If you're doing the same iteration in several places then you can use a named function instead of a lambda, which means less typing for each `for` loop. Here's a rough sketch: https://godbolt.org/z/x94WY77rv
bawolff 31 days ago [-]
I kind of agree, but at the same time, for loops are just a fancy control structure that any student should be able to implement with goto.
IshKebab 31 days ago [-]
I mean people do use iterator instead of for loops quite a lot.
IMO the thing that would be really nice is if control flow like `for` was actually the same as using an iterator. This would really help in Rust too where handling errors inside iterator callbacks is a right pain.
I've seen a few languages try this but it seems to not be very popular. I think it can get a bit confusing how control flow keywords like `return` and `break` work if you turn `if` into syntactic sugar for a function call taking a closure etc.
charrondev 31 days ago [-]
In what way is for different than an iterator?
In PHP you loop through an iterator with the foreach control structure.
In JavaScript you use for of.
In rust it’s for in.
What am I missing?
IshKebab 30 days ago [-]
You're missing that I am an idiot and was mixing up iterators with functional style programming :D
bawolff 31 days ago [-]
That's why i said "goto". Iterators are already a high level cintrol structure.
jerf 31 days ago [-]
Whoops, my edit window closed, but that first comment derives from a previous version that had a different signature for the functions in the "options" list. Ignore it.
monkeyelite 31 days ago [-]
This is solved by iterators in C++. The idea of an iterators is to generalize the concept of a pointer —- something which refers to a location in your data structure.
For example the most basic operations of a pointer are to advance and dereference.
std::map is actually implemented as a tree. To iterator its members you can do
for (cost auto &pair : map)
The only requirement for your custom data structure to work is to implement begin() and end() which return iterators - “pointer like” objects.
rerdavies 30 days ago [-]
I'm surprised nobody's brought up C++ generators yet (which avoid potential problems with recursion stack depth).
for (const auto&node: await depth_first_tree(node))
And generators have the added advantage that walking trees is just a special case of the more general case of traversing a directed graph of elements, which is equally easy.
Are generators not a thing in other languages?
trealira 30 days ago [-]
I didn't even know C++ had generators, but you're right, generators make functions that walking trees much more obviously correct and convenient to write.
The site cppreference has an example of walking trees in C++ using them.
But I have a question: why is it that in your example, you write `await` before the generator function, but it's not in the example given on cppreference? Also, did you mean `co_await`?
rerdavies 28 days ago [-]
To answer your question: I have used them, but I was too lazy to look up the correct syntax. Yes. It should be co_await. The library I used was layered on top of the C++20 co-routine implementation, not c++23, and used co_await there (but probably should not have).
Do NOT use the c++20 co-routine APIs (a half-implemented nightmare of an API, although what is there does miraculously work, contrary to expectations). Probably better to wait for c++23 generators, which are so far available as an experimental feature on GCC 14.0 (which makes the feature unusable in any of my projects).
All of which, I guess, answers my question about why nobody has brought up c++ generators yet. C# has nice generators.
Sorry I brought it up. :-(
monkeyelite 30 days ago [-]
> (which avoid potential problems with recursion stack depth).
Iterators for trees are not implemented with call stack recursion.
> I'm surprised nobody's brought up C++ generators yet
You cannot modify, insert, or use standard algorithms with a generator.
yxhuvud 31 days ago [-]
No, that seems like language bloat. Better to make your language have internal iterators, as then data structures can add their own logic that matches their need for iteration. Then special cases don't need additional syntax.
naasking 31 days ago [-]
Trees aren't really a "special case". Sequences, trees and graphs are core abstractions to most programming, why try to flatten all trees and graphs to sequences?
smallnamespace 31 days ago [-]
Because your code is actual running serially so no matter what there is an iteration order, and the order matters for performance even if your code does not care.
For example if you literally don’t care about the order then your code can map over a default iterator that is efficient (DFS).
naasking 31 days ago [-]
Not true due to parallelism. Also, SQL demonstrates that you can describe the desired result declaratively without being concerned about iteration order. I see SQL's CTEs as a good example of the kind of primitive the article is talking about.
alterom 31 days ago [-]
Sure, and perhaps that's the reason why we don't have a built-in for_tree: for some people, order matters; for others, it doesn't.
Then, for some cases, depth-first traversal is needed; for others, breadth-first.
Then, there's parallelism, and even the plain old for loops aren't parallel by default.
By the time you specify exactly what you need from a tree traversal, you've written code to do it.
And if you're fine with some default choice — you already can use the default iterator with the for_each loop.
I don't see what need there is for adding an extra for_tree syntax to do that.
bloppe 31 days ago [-]
Also a lot of trees have fixed arity, like binary trees. But to model a general tree, you'd need a dynamically allocated list of children for each node, adding a layer of indirection that can noticeably worsen performance.
Trees are often considered too diverse to include even in the standard library, let alone as a primitive. Even Python doesn't have trees in the standard library. I'm sure it's been proposed and rejected at some point
naasking 31 days ago [-]
> But to model a general tree, you'd need a dynamically allocated list of children for each node, adding a layer of indirection that can noticeably worsen performance.
I hate to be a broken record, but this again goes back to the assumption of a specific sequential computation model. Modelling relations as in SQL automatically supports for N-ary child relations. There are other computation models! eg. logic, relational, etc.
bloppe 30 days ago [-]
I'm talking about cache optimization. Fundamentally, you simply cannot guarantee data locality while also allowing resizing. Growing the child list may necessitate moving it in memory. This is true no matter what your computation model is.
smallnamespace 30 days ago [-]
Those other models lack native mechanical sympathy with the hardware.
Let's be practical about it: the reason you might want to say map_tree is so it could potentially be optimized to run more quickly, e.g. via automatic parallelization.
But to even parallelize a tree read operation we'd have to pull in other threads, split work, we'd want metadata on each node (child counts), and at the end of the day within each worker you'd still have a serial execution order.
For map_tree to have practical benefit your language would need to make a ton of opinionated choices about trees and parallel programming. Feasible? Yes. Worth making it a basic primitive? Well, even LISPs don't offer map_tree, despite building the whole language on top of car/cdr.
lmm 30 days ago [-]
> Sequences, trees and graphs are core abstractions to most programming, why try to flatten all trees and graphs to sequences?
Because "trees" and "graphs" aren't actually single concepts, they're families of concepts that can vary along many dimensions. Most programming languages offer one or a small handful of sequence types, because most sequence use patterns are the same. Most languages don't offer a tree or graph type, and as soon as you try to implement one you see why: there are multiple different kinds of trees and graphs, and multiple different representations that are appropriate for different use cases, and any one you might pick would be unsuitable for most programs people want to write.
naasking 28 days ago [-]
> Most languages don't offer a tree or graph type, and as soon as you try to implement one you see why: there are multiple different kinds of trees and graphs, and multiple different representations that are appropriate for different use cases
I think it's more because the abstract interfaces for trees and graphs that can support multiple representations aren't as well known [1]. An iterator/sequence interface has a simple abstract structure that everyone immediately understands, but the structure needed to abstract over graphs and trees are trickier.
The point wasn't that it should be treated as a sequence of not, but that with good base abstractions you can move the decision of what to do, and how, into the data structure itself, rather than having to have the language decide for it.
Folding and traversing can be trivially done in all relevant programming languages if you have an iterator
trealira 31 days ago [-]
I feel like programming an iterator like this akin to a state machine is just not that convenient or trivial, though. Below is pseudo-Java.
class InOrderTreeIterator {
Stack stack;
TreeNode cursor;
InOrderTreeIterator(TreeNode root) {
cursor = root;
s = new Stack;
}
bool hasNext() {
return cursor != null || !stack.empty();
}
TreeNode next() {
if (cursor != null) {
while (cursor.left != null) {
stack.push(cursor);
cursor = cursor.left;
}
} else if (!stack.empty()) {
cursor = stack.pop();
} else {
throw new NoSuchElementException();
}
TreeNode ret = cursor;
cursor = cursor.right
return ret;
}
}
munificent 31 days ago [-]
It's much easier if the language has built-in support for generators. Here's an in-order tree iterator in Dart:
Iterable<TreeNode> inOrder(TreeNode node) sync* {
if (node.left != null) yield* inOrder(node.left!);
yield node;
if (node.right != null) yield* inOrder(node.right!);
}
trealira 31 days ago [-]
Yeah, if it has built-in support for generators, it basically makes the state machine implicit for you, which is nice and convenient; it's like going from having to manage your own stack call stack to using a programming language that just allows function calls and recursion.
ookdatnog 30 days ago [-]
Don't iterators necessarily "forget" the structure of the tree? I can see how you can write an iterator to, for example, compute the size of a tree, or generate the set of all variables that occur in an expression tree.
But it's not obvious to me how you'd write a generic iterator that supports something like finding all free variables in an expression tree, or even just express a tree mapping function that constructs a new tree with the same structure (for example, "make all variables in the expression uppercase"). It's been a while since I worked with iterators so correct me if I'm wrong and this is in fact easy with iterators.
With something like Haskell's recursion-schemes library [0] these operations are all just a few lines long, guaranteed to terminate, and don't need to be updated if your data structure changes (for example you add new expression nodes). I'm not aware of any non-functional programming language that can do that. See for example the freeVars function at the bottom of the linked recursion-schemes doc.
I do think recursion schemes are pretty cool, particularly the idea that a list fold could be generalized to the concept of catamorphisms, which could be mechanically derived upon defining a data structure, because the shape of the recursion that's defined is the same as the shape of the recursion of the data constructors. It makes me realize that programming languages still could be higher-level; what if a programming language automatically derived the equivalent of the "fold" and "map" functions for any recursive data type defined by the programmer?
At the same time, the way it's implemented in Haskell's recursion-schemes library might be hard to wrap one's head around at first, kind of like how the list functions "foldr" and "foldl" are also often confusing to newbies, even though they're like the go-to, default way to make list functions in Haskell.
s-zeng 30 days ago [-]
Haskell already supports deriving Functor and Foldable
trealira 30 days ago [-]
Forgot about that lol, my bad. I was really thinking of deriving the equivalent of catamorphisms.
munchler 31 days ago [-]
Agreed. It’s amusing to see procedural programmers slowly rediscovering what functional programmers have known for years. Paul Graham called this “The Blub Paradox”.
pmontra 31 days ago [-]
I remember that I was using trees quite often last century, when I was writing programs in C. I seldom use tree or comparably complex data structures nowadays, when I almost only write web apps (mostly backend in Ruby or Python but some frontend in JS.) I'm bet that both Ruby and Python have plenty of trees written in C inside their interpreters. I do everything with arrays (lists) and hash tables (dicts) in Ruby and Python. Maybe a language construct for trees would be nice for lower level languages and almost out of scope for higher level ones.
The same from another angle: there are a lot of trees in the indices of SQL databases (example [1]) but we don't zoom in to that level of detail very often when defining our tables.
User interface widgets form a forest - windows contain layouts that contain widgets which also can be layouts, etc, etc.
To implement Brown's algorithm to optimize class-based language models I had to implement a complex forest (DAG, actually) in Python using lists of fixed length. That was not especially nice to work with.
scotty79 30 days ago [-]
Nested dicts are trees.
walleeee 31 days ago [-]
Do you keep your dicts fastidiously flat?
pmontra 31 days ago [-]
Usually yes. Web apps are simple. There isn't much to do inside a request params to db to response call. But sometimes, not every year, there is something to really reason about. I still remember some project from 10 or 20 years ago.
Retr0id 31 days ago [-]
> "Why not just use recursive functions"
One great reason not to use recursive functions for traversing trees is that you can allocate your own stack data structure rather than relying on the call stack itself. In most languages/runtimes, the call stack has a maximum depth which limits the depth of trees you can process, usually on the order of thousands of stack frames.
Managing your own stack usually produces weirder looking code (personally I find "naive" recursive approaches more readable) - but having it as a first-class language feature could solve that!
cobbal 31 days ago [-]
If we're fixing the language, may as well fix the real problem and not artificially limit stack space.
Retr0id 31 days ago [-]
The language doesn't get much of a say, stack limits are usually inherited from the OS itself. It's fixable by not using the OS-provided stack but that's a much more invasive change than a new syntax/stdlib feature.
worthless-trash 31 days ago [-]
The compiler/language absolutely gets a say in this, you can do tail call recursion without stack saturation.
Retr0id 30 days ago [-]
Tail call elimination is an optimisation that is only possible for certain code patterns. How would you write a tail-recursive tree walk, without using a separate stack or queue data structure to store state?
worthless-trash 28 days ago [-]
Challenge accepted, I'll do it over my next long weekend.
swiftcoder 30 days ago [-]
To some extent, this seems like it's just a symptom of "writing iterators in C++ is harder than it ought to be". You don't need to have a tree in memory to build an iterator over it - folks write iterators over implicit data all the time.
Here's the range based for loop counterexample from the blog post, as a python generator:
import itertools
def iterate_tree():
for length in range(1, 8):
for combo in itertools.product("abc", repeat=length):
yield ''.join(combo)
for s in iterate_tree():
print(s)
uzerfcwn 30 days ago [-]
C++ also has generators[1], but the author is perhaps unaware of them.
"yield from" is also very useful to iterate trees in python.
soegaard 31 days ago [-]
Pick a language that allows users to define their own control structures and test your idea in practise.
Candidates: Racket, Scheme, Rust.
k__ 31 days ago [-]
You don't even have to got that far.
Defining your own iterator would be enough for most cases.
scotty79 30 days ago [-]
The point is not to do that.
scotty79 30 days ago [-]
I think Scala is also close. It has really flexible syntax.
lutusp 31 days ago [-]
The article's thesis relies on the idea that a genuinely primitive traversal action exists, in the way that a for-loop is primitive and widely applicable, or adding two floats is common enough to justify building it into the language (or processor).
But tree traversal doesn't have this universal property. There are too many methods and purposes for traversing a tree, sufficient that IMHO no single primitive embodiment could materially improve a language. Also, modern compilers efficiently break down high-level traversal code so well that expressing the idea at a high level incurs no serious penalty compared to having a primitive for that purpose, or a series of them.
fngjdflmdflg 31 days ago [-]
I read a similar argument for graphs in general,[0] where this is more obviously true.
I'm not sure if it needs to be at the syntax level, but I think having built in standard library support for graphs (general trees) would help make graphs more common in programming. And seeing how powerful they are, I think that would be a good thing!
I explored this idea with gstd, a standard library for graphs inspired by the JS Array interface: https://github.com/cdrini/gstd/
injidup 31 days ago [-]
That's what c++20 coroutines paired with c++23 generators are for. It's easy to define whatever traversal you want and then expose it as generic code.
Sure you can do it any way but the OP was about adding language features for iteration of a specific data structure. Iteration in C++ is state machine based and coroutines give you a neat way to implicitly model them without having to manage the boiler plate yourself.
uecker 30 days ago [-]
Yeah, but what I do not get is why you need all this. What problem does this solve? The C function that advances to the next tree node does exactly the same. And what is neat about it? It just seems a lot more complicated. Or in other words: A new language feature should make things simpler and not more complicated to express. But maybe I am just missing a realistic examples where it actually makes things simpler?
ellis0n 31 days ago [-]
Great example!
mubou 31 days ago [-]
I bet you could do something generic like this in languages that have deferred execution like C#'s IEnumerable. Something like
foreach (Node node in EnumerateNodes(root, x => x != null, x => [x.Left, x.Right]))
where EnumerateNodes uses `yield return` (i.e. is a generator) and calls itself recursively. Though it'd probably be easier / better performance to write an implementation specific to each node type.
taeric 31 days ago [-]
Tree traversal is an odd one to want to try and make primitive. There are a lot of hidden questions that you need to consider when traversing a tree. Would you want your syntax to indicate what order traversal? Should it indicate a way to control the extra memory that you allow for the common stack/queue used in basic traversal? If you have a threaded tree, should the syntax work without any extra memory usage?
JaumeGreen 30 days ago [-]
One of the projects that I have only in my mind space is an Iversonian language with more data structures and the tools to easily navigate an traverse them. Trees, graphs, ... the sky is the limit. And with the power of notation as a tool of thought, and a pen interface, it would be fun to do whiteboard sessions that produced real code that could be executed as is.
OTOH I read/heard that the beauty of array languages is that they have few data types, but they can be easily worked on. So maybe the answer is not easier tree traversal primitives, but better literature/training on how to transform it in a more manageable data type.
Sometimes the answer is to adapt our vision to our world, sometimes the answer is to adapt the world to our vision.
branko_d 30 days ago [-]
In C#, you can implement an iterator method with the signature like this:
public static IEnumerable<TNode> DepthFirstTraverse<TNode>(
TNode root,
Func<TNode, IEnumerable<TNode>> children
)
You can then just ‘foreach’ over it.
This doesn’t require the whole tree to be in-memory. Optimal implementation would recurse via programmer-controlled stack (instead of language-controlled) to avoid nested iterators.
A similar approach should be possible in any language which supports iterators/generators.
HelloNurse 31 days ago [-]
The simplifying assumption that all tree nodes have the same type and the same function should be called on them seems unrealistic.
This kind of imperative iteration seems better served by the traditional visitor design pattern: more verbose (more explicit, not more complex) and more general.
hyperhello 31 days ago [-]
This is interesting, but the syntax doesn't seem to have the right expressiveness for such a large change.
for recursive (Node t = tree.root; t != NULL;) {
puts(t.value);
if (t.value == target) break;
if (t.value == dontfollow) continue;
if (t.left) continue t.left;
if (t.right) continue t.right;
}
return t;
Regular 'break' is to really break out of the structure like a regular for, as regular 'continue' is to do the next iteration. But if continue has a value to recurse on, it reenters the for loop like a subroutine.
As a bonus, I think this is tail-call-optimization friendly.
toxik 31 days ago [-]
Now allow it to do BFS or DFS, suddenly you're not far from just a vanilla BFS or DFS if you have a sensible stack/queue API.
hyperhello 31 days ago [-]
DFS is pure flow control, like iterating an array. BFS isn't really the same simple deal as DFS, it either requires approaching the next level and saving it for the next depth somewhere, or navigating the entire previous level again.
gue5t 31 days ago [-]
It seems like this is grasping for languages to expose recursors[1] or induction principles. These factor out the structure of an inductive type (e.g. trees) and leave the caller to simply plug in the behavior to perform for each possible situation that might arise during traversal.
My thoughts on this are to add a third keyword alongside `break` and `continue` to loops that allow you to push another (loop) stack frame and descend into a tree-like data structure.
For exposition, lets seperate loop initialization from the rest, make `continue` mandatory to repeat the loop with another specified value (i.e. the loop implicitly ends without a continue) and lets say `descend` to recurse into the tree.
sum = 0;
loop (node = tree.root) {
sum += node.value;
if (node.left) descend node.left;
if (node.right) descend node.right;
}
Compare with a linear loop:
sum = 0;
loop (i = 0) {
if (i >= array.length) break;
sum += array[i];
continue i + 1;
}
The `descend` keyword differs from `continue` in that its more of a continuation - control flow will continue from that point once it is done.
You could make this more functional and less imperative (and I'd be surprised if such things don't exist as libraries in some functional languages). Yes ultimately we can transform this to recursive functions and compile it that way... but for imperative languages (especially those without closures) something like the above might be useful.
Arguably the above loops are _more_ imperative and less declaritive than the standard ones from the C family. You could add a functional twist by breaking with a value (which is the result of the overall `loop` expression).
vinceguidry 31 days ago [-]
When I went looking for this in Ruby, I eventually landed on RubyTree[1]. Being able to subclass the TreeNode makes everything so much friendlier. Sure, I could implement them myself, but getting converters to/from primitives for free is a lot off my mind. Would be nice to have it built into the language but honestly Ruby has a whole lot of stdlibs and default gems already.
I think Java did the right thing by making tree shaped collections, since they’re iterable and that covers the for loop situation, but it doesn’t expose the tree structure, so you can’t tell if two nodes are siblings or relatives or indeed if they are even in the same tree.
But I think grafting those two together is the right answer, not inventing a new loop construct.
Trees are easy to write iterators for. DAGs are a bit harder and full graphs are an advanced interview question.
Not as ergonomic as a direct tree-iterator, but I can't see of an elegant way to introduce that in an imperative language while keeping the forking/recursion aspect clear
Someone 30 days ago [-]
I would start with a tree primitive. That’s difficult enough, with variations on branching factor, whether the tree is self-balancing, whether it is immutable, etc)
I guess/expect that will have to expose the implementation detail of the branching factor (yes, a binary tree is a special-case of a https://en.wikipedia.org/wiki/B-tree for B = 2, but algorithms tend to be different)
On top of that, build a library of tree traversal routines (breadth first, depth first with a few variants), allowing the user to specify code that decides, on each node visited, whether to navigate further down and whether to stop the traversal.
Now, whether the language proper needs a generic way to specify tree traversals? I doubt it, and doubt even more that it is possible too that in a way so that a) most graph traversal algorithms can use the primitive and b) the result is easier to use/looks significantly nicer than just calling those routines.
meltyness 31 days ago [-]
For perspective, I'm a novice with algorithm design, I've been grinding leetcode for the past 6 months or so, almost exclusively in Rust. I was bewildered by the same concern since I had initially set out to, not only focus on Rust, but to primarily maximize use of the Iterator construct, since I was not intricately familiar with it. A few months in I discovered that there was an appropriate Iterator construct which accomplishes the same thing.
// Comments for the non-Rust native reader, regarding this Function declaration:
// successors is a function that accepts an `Option` container for some Value of type T, called `first`
// and a Closure called `succ`, constrained below:
pub fn successors<T, F>(first: Option<T>, succ: F) -> Successors<T, F> ⓘ
where
// `succ` must receive the iterated state, and return the next iterated state
F: FnMut(&T) -> Option<T>,
// Each time the `next()` function is called on the returned Iterator (a Successors-flavored iterator),
// the state of `first` is yielded, and then
// `succ` is called to progress
// until a `None` type is reported by `succ`
I'm not sure where the concept came from, but it's not dissimilar to the author's implementation, but instead of the ControlFlow enum, it relies simply on the Option enum. I know though, that it was initially built in the Itertools crate as unfold and then upstreamed some time later.
Essentially you use `first` to contain a Queue, Stack, or Level for the different traversals, and define traversal or activities from there.
It's fairly ergonomic in practice, ergonomic enough for Leetcode.
I'm 100% sure smug Haskellers have something very important to say here. :)
chowells 31 days ago [-]
You know, if Haskellers already have that functionality, maybe it's less that they're smug and more that they have already solved the problem. Do you call someone smug when you complain about being freezing and they invite you in out of the cold?
Joel_Mckay 31 days ago [-]
Normally someone would respond, but they are prone to lazy evaluation...
Thank you... I'll see myself out... lol =3
tbrownaw 31 days ago [-]
Standard tree traversal loop (yes, missing some obvious conditionals I didn't feel like typing out):
var todo = new List<T>();
todo.append(root);
while (var item = todo.pop_front()) {
todo.append(item.left); // or .prepend for depth-first
todo.append(item.right); // or .prepend()
// do stuff...
}
pseudocomposer 31 days ago [-]
Couldn't you just do this with a regular for loop and a few datatypes/functions? (This pseudocode is an Elm/Haskell/Rust/TypeScript-inspired abomination, but pretty portable to any language...)
type Node = { value: Any, left: Node, right: Node }
type Direction = Left | Right
type TreePosition = { root: Node, currentNode: Node = root, position: Direction[] = [] }
# Implementation left as an exercise but should be obvious and run in O(1), I believe. Returns Nothing when we're out of nodes.
function nextPosition(position: TreePosition): Option<TreePosition>
# The tree you want to iterate through
const myTree: Node = ...
# The loop
for(let position: TreePosition? = TreePosition(root: myTree); position != Nothing; position = nextPosition(position) {
node = position!.currentNode
# Your loop code
}
I'd argue this doesn't belong as a language-level feature, but maybe an API/stdlib-level feature.
xxs 31 days ago [-]
Indeed, I don't see any need to have a tree as a language primitive along with a traversal function (iterator). Tree traversal is not really different than iterating over a vector/list. E.g. even java has stuff like: TreeSet.forEach(consumerRef) or for(Type val : tree) doStuffWith(val)
HelloNurse 30 days ago [-]
Trees are an insignificant middle ground between lists (which, by the way, are degenerate trees) and general iterators (on general object graphs, or not even materialized).
func Search(Root : Tree; Desired_Value : Tree_Value)
-> optional Tree_Id is
var Identifier : optional Tree_Id := null;
for T => Root then Left(T) || Right(T)
while T not null concurrent loop
if Value(T) == Desired_Value then
// Found desired node,
// exit with its identifier
exit loop with Identifier => Key(T);
end if;
end loop;
return Identifier;
end func Search;
I actually don't use this feature much, because usually there is a key of some sort that determines which side of the tree is of interest.
pbohun 31 days ago [-]
I think this kind of control mechanism is not a great idea and could lead to easy bugs.
I think it's a better idea to write a function that applies a function to the elements of a tree. Ideally you'd make a function for each traversal order. This makes it obvious what is happening.
map_tree_pre(my_tree, my_func);
map_tree_in(my_tree, my_func);
map_tree_post(my_tree, my_func);
map_tree_bfs(my_tree, my_func);
31 days ago [-]
adamc 31 days ago [-]
Or maybe you need a different language, where implementing an efficient tree traversal operation is easier?
foota 31 days ago [-]
One thing I wish for is for standard library trees to present a way to take advantage of a trees structure, even if they don't directly expose the tree itself. For instance, C++ map could expose a way to start find or e.g., lower_bound from some node. While I'm making a wishlist, I also wish lower_bound worked with reverse iterators (see e g., https://stackoverflow.com/questions/64351187/stdlower-bound-...) :)
I think allowing for starting this kind of search from a given node would cover most (though not all) of what you'd want to expose the trees structure for directly.
scotty79 30 days ago [-]
Branching iteration seems like a great idea. But I think the syntax shouldn't come from tree traversal. It should be something general. You tell the normal for loop how to go to the next sibling (`increment`) and when to end naturally (`condition`) or prematurely (`break`) and how to skip current sibling (in whole or partially) (`continue`). Branching iteration would also need to know how and when to branch, that includes skipping branching for this particular item, end whole iteration naturally and prematurely.
So something similar to `break` and `continue` should be the method of branching (descending into the tree).
I think Scala might be a great language to prototype implementation of such construct.
mpweiher 30 days ago [-]
Adaptive Programming[1] by Karl Lieberherr is the most comprehensive attempt I have seen to formalize and separate iteration over arbitrary structures as a distinct PL construct.
It allows you to specify traversal in a structure-shy way. Personally, I think it goes a little too far, but it's definitely a good inspiration.
> This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Also I’m slightly confused by this example.
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
}
So, our “next node” operation is to concatenate to x. Won’t we either have to have a method for modifying x to go “up” a node, or we’ll have to keep a record of what x was upon entering each node? Like in this example we’ll end up with x=“aaaaaaaa” and then go up a node, over to a “b” node, and get x=“aaaaaaaab”, right?
voidUpdate 31 days ago [-]
The author said it could be implemented recursively, so a call with x="aaaaaaa" would call three more functions with x="aaaaaaaa", "aaaaaaab" and "aaaaaaac". you never need to go up a node
dmurray 31 days ago [-]
I think the intention is that you keep a record of what x is in the current node, and at every node above the current node. In the recursive implementation which the author describes as equivalent, these values are kept in the stack.
bee_rider 31 days ago [-]
Seems a bit inefficient…
I guess we can delete the a node’s copy of x after all of a node’s child nodes are visited, at least.
dmurray 31 days ago [-]
Well, it doesn't need to be any worse than the stack-based implementation of a recursive tree traversal, and it can't be significantly better. You have to store that state somewhere.
Perhaps it can be optimized to be a little better than the recursive version, depending on how much overhead your language uses for a stack frame that it won't need for this special case.
demarq 31 days ago [-]
I would think iterators are exactly this. And they exist in almost every language.
kelseyfrog 31 days ago [-]
I've written about this before, but suggesting recursion for tree traversal is equivalent to suggesting goto is a replacement for if/for/while.
We don't have syntax and semantics for recursion schemes in any programming language - it's always deferred to library support at best. As far as I'm concerned, this is an open problem in programing language design where we finally replace the vestigial recursion hack with a proper structured programming solution.
skribanto 31 days ago [-]
Well, recursion is well-structured and can be reasoned about through induction. Goto is unstructured…
JonChesterfield 31 days ago [-]
This implementation recurses on the call stack and calls that out as a feature. A built into the language tree walk which overflows the stack on large trees is perfect for C++, get a paper over to WG21.
Or for a possibly more useful comment, constant space tree traversal is genuinely quite difficult to implement. I don't know a general solution to it (other than walk from the start N times, quadratic style), would be interested to hear of one.
trealira 29 days ago [-]
> Or for a possibly more useful comment, constant space tree traversal is genuinely quite difficult to implement. I don't know a general solution to it (other than walk from the start N times, quadratic style), would be interested to hear of one.
There's Morris tree traversal. It basically turns an ordinary binary tree into a threaded binary tree and fixes it up after it's done. But, this is mainly possible because of the space wasted by the null pointers at the leaves of the tree. You can hide a stack within the tree.
HelloNurse 31 days ago [-]
Constant space tree traversal is impossible and unnecessary. On the other hand predictable space, including adding parent pointers to cheat, is easy to achieve by counting the total number of nodes or the depth of the tree.
somat 31 days ago [-]
"Constant space tree traversal is impossible..."
Naively I would expect something like left-hand-rule maze traversal to be constant space. The tree may need additional structures to support this sort of travel.
Other thoughts: most state machines are constant space.... there may be something there.
HelloNurse 30 days ago [-]
The "left-hand-rule maze traversal" requires backing out of dead ends (in a tree data structure, up from leaf nodes), which amounts to one permanent parent pointer per node instead of a transient stack of nodes that we have not finished visiting (in addition to the baseline of child pointers).
JonChesterfield 29 days ago [-]
For what it's worth, repeatedly traversing from the root _doesn't_ require backing out of dead ends, if you can avoid traversing into dead ends. Specifically if you store the size of the subtree at the node you can traverse directly to the ith element from the root, and do that N times and you get a traversal.
That constant space used to store the size could instead be a pointer to the root, but all the use cases I have for trees involve aliasing subtrees in which case there can be multiple "parents" for a given subtree and one cannot point to it.
In the spirit of completeness, my standard tree traversal recurses to a fixed depth and then falls back to the walk-repeatedly-from-the-root, which is constant space of a sort, but feels suboptimal.
HelloNurse 29 days ago [-]
Space proportional to node count (used for a pointer or a tree size or something fancier) is up to exponentially larger than space proportional to tree depth (for a stack representing the state of a traversal), the same if every node has only one child.
It might be a competitive approach if the tree is very unbalanced or there are enough concurrent traversals, but in main experience the main reason to use parent pointers or the like are operations that are not visiting each node once starting from the root.
kevinventullo 31 days ago [-]
If nodes have pointers to their parent, constant space tree traversal is easy enough.
MontagFTB 31 days ago [-]
Sean Parent developed ‘forest’ for C++ that automatically maintains the invariant that its structure is always a hierarchy. It includes full-order iteration through its default iterator: https://stlab.cc/2020/12/01/forest-introduction.html
usrnm 31 days ago [-]
Don't need to go that far, std::map is a tree (the standard does not dictate it, but it is in all ilmplementations), and you could always traverse it. The whole idea of iterators was popularized by C++ and its STL.
MontagFTB 31 days ago [-]
std::map uses a tree as an implementation detail to achieve certain performance guarantees. It is not a tree from the user’s perspective, however. That is, there are no parent/child relationships between the elements in a std::map.
lucasoshiro 31 days ago [-]
This is heavy biased to C++, which is a language that already has too many features that people just want to use. This is a very specific case to be placed as a language feature, and could be just a lib.
If this becomes a C++ feature, imagine how many data structures we would need to support?
Many other languages, specially the FP languages, allow to do that as a library. Even the languages that are only inspired by FP. Example, Ruby:
class BinTree
include Enumerable
def initialize v, l, r
@v, @l, @r = v, l, r
end
def each &block
@l.each(&block) unless @l.nil?
yield @v
@r.each(&block) unless @r.nil?
end
end
Using the Enumerable mixin includes many FP-based methods, such as map, filter and reduce by only defining each, which in this case is DFS.
And so on. The same can be done in Python, Kotlin and many others.
monkeyelite 31 days ago [-]
> If this becomes a C++ feature, imagine how many data structures we would need to support?
C++ already solved that problem. Iterator are designed so that an algorithm can be written once and used with multiple data structures.
lucasoshiro 31 days ago [-]
So we can go back to just use them and just write a new library
monkeyelite 31 days ago [-]
You don’t need the standard library to make iterators.
lucasoshiro 30 days ago [-]
So you can use it.
I really don't see your point
jonstewart 31 days ago [-]
I worked for Guidance Software in the aughts and their main product, EnCase, had its own proprietary scripting language built in, called EnScript. Everything in EnCase inherited from “NodeClass”, a linked list for creating trees.
EnScript had a forall(NodeClass n in tree.GetRoot(){} construct that was very easy. It was essentially a depth-first iterator.
globular-toast 30 days ago [-]
For isn't "linear traversal", it's linear execution. It's up to the data structure to define what linear traversal is. For arrays and lists that's trivial. For trees and, more generally, graphs there are many ways to do traversal. This is what iterators are for.
31 days ago [-]
systems 31 days ago [-]
well, functional languages with recursive types, are very good at representing
binary trees
Once you have algebraic data-types in a language, writing a recursive visitor pattern is pretty simple.
Encoding the semantics of a tree traversal operator likewise is difficult in the general case. What exactly would the order be, what if I want to traverse in a non-standard ordering, what about skipping branches; all would be difficult to cleanly represent.
I have seen it done where you return actions with key ones being recurse, stop, replace, and replace & then carry out some function, but again, this is pretty simple to implement.
MJGrzymek 31 days ago [-]
sticking to dfs, there is still a difference between pre-order/in-order/post-order
thatguysaguy 31 days ago [-]
It's not at the language level, but for python JAX has the notion of a pytree (arbitrarily nested combination of lists, dicts, and tuples), and includes map and reduce operations for those objects. It's very convenient!
Though Haskell's Traversable is similar in name, then depending on what the developer intends, both Functor, Foldable or Monad could also help with the traversal. I.e, Haskell already has what the blog post asks for.
qoez 31 days ago [-]
How in the world is this getting a HN hug of death when it's a substack page?
f1shy 31 days ago [-]
Well common lisp has it.
SuperV1234 31 days ago [-]
What you want is 'yield' and generator coroutines. We have them in C++20, but their codegen is very poor compared to a manual implementation.
drbojingle 30 days ago [-]
What programming languages need is dimensional analysis. The fact that I can make variables like
Const km = 1
Const velocity = 11
And do km + velocity = 12 is all kinds of silly.
erichocean 30 days ago [-]
What programming languages need is linguistic analysis. The fact that I can make variables like
Const true = False Const false = True
And do if (true) "won't happen" else if (false) "will happen" is all kinds of silly.
zupa-hu 30 days ago [-]
That exists, it is called a nominal type system, while in your example the language has a structural type system.
Edit: Assuming your example is even statically typed.
kazinator 31 days ago [-]
Here you go!
A TXR Lisp macro for-tree for traversing a tree, given a variable, an expression for the root node, and expressions for how to to left and right relative to the variable.
A node structure that for-tree knows nothing about. No iterator abstraction or API, nothing.
(defstruct node ()
value
left
right)
Example tree (a balanced binary tree whose numeric values happen to be sorted):
(defvar example-tree #S(node value 5
left #S(node value 3
left #S(node value 1)
right #S(node value 4))
right #S(node value 7
left #S(node value 6)
right #S(node value 9)))
The left-expr and right-expr must be expressions that involve the var variable; the construct evaluates these to find the left or right child. When the value is nil it means there is no child.
This is almost exactly what the blog is asking for, transliterated to Lisp syntax.
Original concept in fantasy C syntax:
for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}
{
print(N->value);
}
As required, the construct specifies the node variable to be the loop dummy, the root expression giving its initial value and the two accessor expressions for left and right traversal.
The termination test N != NULL is made implicit in the Lisp version, so early termination requires a break out of the loop. It could be arranged.
The fantasy C syntax specifies a variable name in the navigation declarator: N : { N->left, N->right }. Presumably, the variable here can be renamed to anything you want; it just arbitrarily happens to have the same name as the loop dummy.
I didn't replicate this feature exactly because it just adds verbosity for nothing. It's okay if the navigation declarator just refers to the loop's one and only dummy variable.
Anyway, programming languages should definitely not have a tree traversal primitive. Rather, languages should be Lisp.
kazinator 31 days ago [-]
Here is the same for-tree macro processing TXR Lisp's built-in search tree type. That does not use structs, but built in node and tree types.
The left, right and key fields of a node object are accessed with like-named functions. We must call tree-root on a tree object to get to the encapsulated root node:
How about a tree of integers. Let's say the root integer is 1, and to go left is to multiply by 2, and to go right is to multiply by 2 and mask in a 1. And let's say the depth caps out at the value 16:
3> (for-tree n 1 (if (< n 8) (* 2 n)) (if (< n 8) (+ 1 (* 2 n)))
(format t "~b\n" n))
1000
100
1001
10
1010
101
1011
1
1100
110
1101
11
1110
111
1111
nil
Let's right-align digits to see different patterns in it:
2> (for-tree n 1 (if (< n 8) (* 2 n)) (if (< n 8) (+ 1 (* 2 n)))
(format t "~6b\n" n))
1000
100
1001
10
1010
101
1011
1
1100
110
1101
11
1110
111
1111
nil
Nah, left-aligned wins.
kazinator 31 days ago [-]
I forgot I can remove the outer parentheses, and also use infix with C-like operators, since I have those modes turned on in the REPL. (Infix is an unreleased feature, coming in TXR 300).
programming languages shouldn't have such primitives, because they're not primitive. Otherwise, people programming different stuff all the time, would need other primitives, and you'd have unmaintainable languages full of bugs.
31 days ago [-]
rurban 28 days ago [-]
Of course not. You already have iterators.
31 days ago [-]
eru 31 days ago [-]
Haskell (for example) doesn't even have a looping primitive. Why would it have a tree traversal primitive?
Both looping and tree traversal can be done with library functions.
In general, everything that can be done with library functions, should be done with library functions.
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
Eh, in Haskell you can derive many of these things automatically. And even Rust has some automatic derivation.
specialist 31 days ago [-]
FWIW, I use Null Objects to eliminate null checks. Makes recursion (tree climbing) more concise, legible.
Bonus: Java's HotSpot magick will NOP (most?) methods of Null Objects, making this a zero cost abstraction.
I should probably write a concrete example for a blog or something.
TLDR: For every base class such as TreeNode, create a NullTreeNode that does nothing, then replace all uses of null with NullTreeNode. Voilá, no more null checks or NPEs.
colbyn 31 days ago [-]
It's called sum types and pattern matching, and this is like 80% of the reason why I like Swift and Rust.
jdeaton 31 days ago [-]
Its called recursion
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
No it does not
austin-cheney 31 days ago [-]
Operating systems have this with file systems.
Web browsers have this. It’s the DOM and it scares the shit out of most JavaScript developers. The fun part is confronting people about their irrational fears watching them spin their wheels with all the imagined excuses.
I am a tremendous fan of tree systems. Tree systems should completely replace inheritance in programming to enforce the composition over inheritance principle.
Mikhail_K 31 days ago [-]
Ordering pizza is much more important and frequent part of a developer's work than tree traversal. Therefore, programming languages need pizza ordering primitive even more, than tree traversal ones.
chess_buster 31 days ago [-]
Nonsense.
Programming languages should have less primitives like this and instead we should have better foundation libraries for the languages, i.e., containing iterator/-interfaces like Rust and Python (or Smalltalk).
Nerdx86 29 days ago [-]
My concern here is what a recursive primitive like this in a low level language (like C/C++/Zig/Rust) would imply to the execution context of the "code within the brackets". If it was within the context of a lambda, etc that's one thing.
I will be transparent here, I do not know of a non recursive, tree traversal alogorthim that uses constant storage but doesn't modify the tree.. (Morris Traversal satisfies 2 but makes temporary changes to the tree)
Most primatives do not allocate hidden data (even C++ iterators require you to instantiate the iteration state as a variable) and operate within a known stack/execution context.
Having an actual primitive that implements arbitrary tree traversal would require a substantial complier generated, but non-transparent, framework for that execution. It would either need to be recursive, have localized dynamic data, or would require tree structures that are well defined to use something like Morris. (And Morris is natively non-concurrent even for reads)
While this means that that simple primitive would actually do a lot of potentially problematic things under the hood. How many Junior programmers would call it concurrently for reads because it's not "modifying the data". Why do you need to lock? Things like brake and continue and other control mechanisms wouldn't actually work as assumed from the flat context.... I.E what happens if somebody called a goto inside of that block? Also, in an embedded context where you may not have a dynamic allocator, how would a version written as in reiterative function with a expanding stack data structure function? From a language standard point of view, there are so many complexities to building something like this that runs in any context that the language is supported.
Something like this that runs in any context that the language is supported.
I mean look at C++, they have specifically and (deliberately / arguably rightfully) failed to build copy on write strings (a simpler abstraction) because there are cases where they couldn't guarantee that they would have the expected results in all contexts.
This to me sounds more like something that should be a lambda function and a standard library, not a primitive... Languages that support dictionaries and maps as part of their standard data structures already have library support for tree traversal internally. They have solved many of the problems of making that language exposed, so that would be a much smaller ask. And programmers rightfully assume a lot more complexity within library calls then they do within primitives. Primitives are designed to be primitive. Having primitives able to create stack frames, dynamically allocate hidden memory, and/or recurse honestly worries me.
mrorigo 31 days ago [-]
[dead]
suddenlybananas 31 days ago [-]
Why clog up crates.io with AI slop?
jheriko 31 days ago [-]
[dead]
meindnoch 31 days ago [-]
Horrible idea.
1. BFS is not supported.
2. Traversal type (inorder, preorer, postorder, or even mixed order!) is also not handled.
3. The syntax doesn't make it apparent that stack overflow can occur, e.g. by doing DFS on a linked list.
danieloj 31 days ago [-]
As a fullstack web engineer I've never had to implement a tree structure at work. I'd love to hear examples of what kinds of companies/platforms people are writing these structures regularly. If anyone is willing to share where they use them I'd appreciate it
rcxdude 30 days ago [-]
I've used them (though more generally a directed acyclic graph) when doing certain kinds of analysis of code execution paths (working in embedded software). They do tend to show up in UI work as well (in an ad-hoc, implicit fashion), in my experience, though generally when you're looking at something across the whole UI as opposed to implementing a specific element of it. I've also used them in a somewhat ad-hoc build-system like utility as well.
(And to be clear, none of these involve sitting down and writing some generic Tree<T> structure, they're all cases of "Well, there's some tree-like datastructure that already exists or is a natural fit for this system and I'm going to be traversing it depth-first or breadth-first to do something to the elements of it or calculate something based on what nodes I pass through on a given traversal")
mvc 31 days ago [-]
You might not implement them but as a web engineer you're using them all the time. So all these tools that you use will have tree implementations in them.
- Every html document is a tree structure. And css documents have special syntax to address nodes within those trees
- If it's any good, you're routing framework probably uses some kind of tree to quickly match the request to it's handler
- The database you write to uses a tree to quickly find the location it needs to write to
chuckadams 31 days ago [-]
I think that’s kind of the GP’s point, that there’s no “implementing a tree” so much as just using the tree that’s naturally there. When I make nested objects, I don’t think of trees or some boxes-and-arrows diagram, I just think “product type”. It’s good to know your graph algorithms, sure, but thinking about the data representation isn’t something one needs to have in the front of their mind in any decently abstracted language.
Philpax 31 days ago [-]
The (V)DOM is a tree. Knowing that is useful for manipulating and composing it.
Major tools that exist today for partial structure traversal and focused manipulation:
- Optics (Lenses, Prisms, Traversals)
- Zippers - Query Languages (for semantic traversal and deep search)[1] https://www.cs.ox.ac.uk/jeremy.gibbons/publications/iterator...
The problem is that 'lens', 'monocle', etc. are famously abstract and difficult for people to apply to their actual problems. IMO, the solution would be for standard libraries to specify interfaces called 'BreadthFirstTraverse', 'DepthFirstTraverse', etc.
Optics are famously abstract in implementation, but I don't think people have trouble applying them - people seem to like JQuery/CSS selectors, and insist on `object.field` syntax; it's kind of wild that no mainstream language has a first-class way to pass around the description of a location in an arbitrary data structure.
[1] https://ghc-proposals.readthedocs.io/en/latest/proposals/002...
[2] https://ghc-proposals.readthedocs.io/en/latest/proposals/015...
[1] https://en.cppreference.com/w/cpp/types/offsetof
[1] https://github.com/graninas/cpp_lenses [2] https://github.com/jonsterling/Lens.hpp
I think people are often too enamored by general purpose languages that can express such abstractions natively. I don't see an issue with a language that provides this as a primitive without being able to express it itself, constraints can be useful for other properties. Once you can traverse trees, most programming problems can be tackled even in such constrained languages, eg. SQL with CTE.
For C++, you could define yourself a template that expands to the two functions you listed.
For any language, you could write yourself a pre-processor that adds for_tree notation and expands it, either with pre-processor semantics or working on the abstract syntax tree (which is more "proper" but also more work"). I would recommend the former to test notations, and then you can live with them for a while to experiment with them and see if and how often you really need the construct (recall that is how C++ evolved from C via "C with classes" - it was first a pre-processor). Once you and others are 100% convinced of the new syntax, go to your prefered language's working group/ISO committee and propose inclusion.
My own feeling is that this is not something for which I need more than recursion; calling inside traverse() traverse(left) and traverse(right) for binary trees or using a normal for loop to iterate over all this->children() from 0 to this->children.size() is something that occurs in graph libraries and parsers/compilers once in a while but not often enough to warrant its own notation. Rather, when I look at languages like C++, I'd have a language that is simpler, cleaner, more homogeneous and more orthogonal; C++ is complicated, convoluted, too large to implement it correctly by a single person in reasonable time (compared to beauties like C, Pascal, Scheme), so I stand on the side of minimalism.
Be that as it may, for C++, Eric Neibler's [Boost.Proto](https://www.boost.org/doc/libs/1_84_0/doc/html/proto.html) could likely help conveniently connect syntax to implementation to achieve something similar to what the author is taking about.
Definitely can be done to streaming data… protobufs in a way is this, given it’s a sort of BNF from a bird’s eye.
https://wiki.haskell.org/index.php?title=Research_papers/Gen...
Think of these as programmable accessors and updaters.
How is iterating through something already not 'programmable' ?
If you read a bit more about them, I think you will be surprised to see the breadth of what these abstractions can be used for. To start, they've been used to build a new compositional foundation for game theory[1], and they can be used to model gradient-based learning in machine learning[2].
As for their simpler applications as getters and setters, they are programmable in the sense that you can think of lens/prism types as interfaces that can be implemented arbitrarily. So you can create your own lenses and combine them like legos to construct new, "bigger" getters and setters from the smaller components.
[1] https://arxiv.org/pdf/1603.04641 [2] https://arxiv.org/abs/2103.01931
EDIT: fixed typo
When does someone give up on the pagentry of programming and just get something done by looping through data instead of "constructing getters and setters to model gradient based machine learning".
It really seems like the straight forward way of doing things is to make simple iteration and worry about the game theory after that is all figured out.
I think your comment is valuable in spirit because it could bring a more thorough discussion of the creation or popularization of a syntactically clean and a shift of the idiomatic-ness of such a solution to traversing trees. Leaving the more abstract and general Optics discussion to be a side dish for commenters to discuss as well.
Also, just a nitpick, but traversing a tree is a broader topic than iteration. Iteration implies, both colloquially and formally, performing a computation on each element of a data structure, while traversal is a more general algorithm that could be the basis of iteration, but may also be more akin to a search through the data structure without expectation of performing computation until the searched for ‘member’ is returned from the traversal. So it’s not an apples-to-oranges comparison with your conflation of iteration and traversal, but does seem a little like Canis Familiaris-to-Mammal comparison.
https://clojuredocs.org/clojure.zip/zipper
Complex data structures absorb lot of the complexity of the problem and reduce the complexity of the rest of the code.
If every ounce of performance matters, e.g. in a database, you want 10000 functions, 100 for each data structure.
I guess all you really need are dynamically allocated arrays. A cons cell is an array of two. A struct with N fields is an array of N. Everything else is built on that.
I use it in every project for data navigation and transformation, and it's more performant than standard Clojure data manipulation, while retaining types (instead of coercing back from seqs).
E.g. if you have a map and you want to increment every value in the map: (require '[com.rpl.specter :as S])
``` (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS] inc)) => {:a 6, :b 7, :c 8} ```
^ try to write that in normal clojure.
Now let's say you have a map of vectors and want to increment all of those? (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS S/ALL] inc)) ;; note the navigator juts got another level of nesting => {:a [2 3], :b [4 5], :c [6 7]}works for all clj data types, and of course it has navigators for recursive walking .
It took me a while to get good at Specter, but it was worth it. I hear Rama uses Specter navigators internally.
https://www.metosin.fi/blog/transforming-data-with-malli-and...
Breadth-first is slightly trickier, but only slightly trickier one time.
Yes, students should absolutely implement the classic algorithms to learn.
Yes, there are some occasions when you need to home grow one at $work.
BUT, in my opinion, most of the time, professional code should use a battle tested, vuln hardened library or builtin version. These things are VERY HARD to get exactly right. Jon Bently's Programming Pearls famously had a latent bug in its binary search for 20 years before someone caught it.
https://research.google/blog/extra-extra-read-all-about-it-n...
So yeah, it looks easy but don't do it. Stand on some giant's shoulders instead.
Anyone who copies and pastes it is welcome to both pieces when it breaks. Others have already alluded to possible improvements that could be made, and I already have my own analysis in a grandchild reply as to why I don't think this is a terribly pressing need or necessarily even a good idea.
The reason I provide code is that it gets past the "oh, you say it's just an iterator, but I still don't believe you, since you haven't spelled it out to the n'th degree". When code is provided, belief ceases to be an issue. It is clearly something an iterator can implement, in existing languages, with existing iterator support.
Unless you're going to claim it is somehow impossible to provide this functionality in a tested manner, you're completely changing the topic in an uninteresting direction, since it is always true that functionality generally needs testing and bits of code slammed into an HN conversation just to make a particular point probably shouldn't be copied wholesale into your production code.
While easy, I think bisect would be a good addition to every stdlib too.
I don't think this is a large problem in practice because you shouldn't be using dozens of tree types in a given code base, so adding iterators to a tree is no big deal. In general there aren't enough types of iteration available to a given data structure that you need to describe how to iterate on it from the "outside". (Generally when you are doing that, it's too non-trivial to fit into this pattern anyhow; see the Visitor pattern in general.) This strikes me as maybe the sort of default tool you might slap in a library somewhere, but it should be a niche tool. If you're using it all the time you're probably doing something wrong. By default your data structures should be providing iteration packaged with them and it should generally be what you need. And your language should support aborting iteration, in whatever that looks like normally. I'm not sure I know a language that doesn't, it's a fairly basic element of iterator support when you get into implementation.
There are also many cases where a tree iterator will perform significantly better, including CPython. I don't have enough experience with PyPy to know if it could inline the Tree.left and Tree.right calls down to zero penalty at JIT time. Rust and C++ and the other static languages with sophisticated compilers might be able to get that down to fully inlined and zero-cost, but even if they can it's probably better not to push that on to the optimizer as the optimizers will eventually give up if this is composed with enough other stuff. Better to just have an efficient implementation in the first place.
(this is for iterating over nested JSON-like objects, which are just weird trees)
There are a lot of ways you could avoid the recursion, but that's a particularly nice way!
[1] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.bin...
> Well a range based for loop requires that your tree exist in memory AND that you have an iterator defined for your tree. With for_tree you could operate on an entirely imperative tree, without needing to define any iterators or generator functions. Here's an example where I'm checking every single string composed of "a", "b", and "c" of length 8 or less.
You could definitely find every string composed of "a", "b", and "c" of length 8 or less by defining a custom iterator but it would be a verbose and unpleasant way of writing it:Python has a number of ways to achieve this depending on exactly how you want to pass the arguments; multiple functions, optional arguments, etc. How nice the final call looks is more about your local language's closures look.
The main point here is that this will happily iterate on things that don't "exist".
(Since things are lazy in Haskell, functions that return lists effectively are iterators. There's probably something in the standard library somewhere for (opts <*> [x]) to avoid the wrapping x in an unnecessary list, but my Haskell is rusty.)And yes, Haskell is amazing at this sort of thing.
If the poster wants to particularize this to C++ because C++'s syntax can't support it in any reasonable manner, that's fine, but that's a C++ problem, not a "Programming languages..." problem. Which would be perfectly understandable and I'm not really complaining, more clarifying that most of the rest of the world can just rub together three or four existing constructs in a pretty reasonable manner to get this.
Ceterum censeo this would be a family of simple macros in LISP.
IMO the thing that would be really nice is if control flow like `for` was actually the same as using an iterator. This would really help in Rust too where handling errors inside iterator callbacks is a right pain.
I've seen a few languages try this but it seems to not be very popular. I think it can get a bit confusing how control flow keywords like `return` and `break` work if you turn `if` into syntactic sugar for a function call taking a closure etc.
In PHP you loop through an iterator with the foreach control structure.
In JavaScript you use for of.
In rust it’s for in.
What am I missing?
For example the most basic operations of a pointer are to advance and dereference.
std::map is actually implemented as a tree. To iterator its members you can do
The only requirement for your custom data structure to work is to implement begin() and end() which return iterators - “pointer like” objects.Are generators not a thing in other languages?
The site cppreference has an example of walking trees in C++ using them.
https://en.cppreference.com/w/cpp/coroutine/generator
But I have a question: why is it that in your example, you write `await` before the generator function, but it's not in the example given on cppreference? Also, did you mean `co_await`?
Do NOT use the c++20 co-routine APIs (a half-implemented nightmare of an API, although what is there does miraculously work, contrary to expectations). Probably better to wait for c++23 generators, which are so far available as an experimental feature on GCC 14.0 (which makes the feature unusable in any of my projects).
All of which, I guess, answers my question about why nobody has brought up c++ generators yet. C# has nice generators.
Sorry I brought it up. :-(
Iterators for trees are not implemented with call stack recursion.
> I'm surprised nobody's brought up C++ generators yet
You cannot modify, insert, or use standard algorithms with a generator.
For example if you literally don’t care about the order then your code can map over a default iterator that is efficient (DFS).
Then, for some cases, depth-first traversal is needed; for others, breadth-first.
Then, there's parallelism, and even the plain old for loops aren't parallel by default.
By the time you specify exactly what you need from a tree traversal, you've written code to do it.
And if you're fine with some default choice — you already can use the default iterator with the for_each loop.
I don't see what need there is for adding an extra for_tree syntax to do that.
Trees are often considered too diverse to include even in the standard library, let alone as a primitive. Even Python doesn't have trees in the standard library. I'm sure it's been proposed and rejected at some point
I hate to be a broken record, but this again goes back to the assumption of a specific sequential computation model. Modelling relations as in SQL automatically supports for N-ary child relations. There are other computation models! eg. logic, relational, etc.
Let's be practical about it: the reason you might want to say map_tree is so it could potentially be optimized to run more quickly, e.g. via automatic parallelization.
But to even parallelize a tree read operation we'd have to pull in other threads, split work, we'd want metadata on each node (child counts), and at the end of the day within each worker you'd still have a serial execution order.
For map_tree to have practical benefit your language would need to make a ton of opinionated choices about trees and parallel programming. Feasible? Yes. Worth making it a basic primitive? Well, even LISPs don't offer map_tree, despite building the whole language on top of car/cdr.
Because "trees" and "graphs" aren't actually single concepts, they're families of concepts that can vary along many dimensions. Most programming languages offer one or a small handful of sequence types, because most sequence use patterns are the same. Most languages don't offer a tree or graph type, and as soon as you try to implement one you see why: there are multiple different kinds of trees and graphs, and multiple different representations that are appropriate for different use cases, and any one you might pick would be unsuitable for most programs people want to write.
I think it's more because the abstract interfaces for trees and graphs that can support multiple representations aren't as well known [1]. An iterator/sequence interface has a simple abstract structure that everyone immediately understands, but the structure needed to abstract over graphs and trees are trickier.
[1] https://www.cs.tufts.edu/~nr/cs257/archive/andrey-mokhov/alg...
But it's not obvious to me how you'd write a generic iterator that supports something like finding all free variables in an expression tree, or even just express a tree mapping function that constructs a new tree with the same structure (for example, "make all variables in the expression uppercase"). It's been a while since I worked with iterators so correct me if I'm wrong and this is in fact easy with iterators.
With something like Haskell's recursion-schemes library [0] these operations are all just a few lines long, guaranteed to terminate, and don't need to be updated if your data structure changes (for example you add new expression nodes). I'm not aware of any non-functional programming language that can do that. See for example the freeVars function at the bottom of the linked recursion-schemes doc.
[0] https://hackage.haskell.org/package/recursion-schemes-5.2.3
At the same time, the way it's implemented in Haskell's recursion-schemes library might be hard to wrap one's head around at first, kind of like how the list functions "foldr" and "foldl" are also often confusing to newbies, even though they're like the go-to, default way to make list functions in Haskell.
The same from another angle: there are a lot of trees in the indices of SQL databases (example [1]) but we don't zoom in to that level of detail very often when defining our tables.
[1] https://www.postgresql.org/docs/current/btree.html
To implement Brown's algorithm to optimize class-based language models I had to implement a complex forest (DAG, actually) in Python using lists of fixed length. That was not especially nice to work with.
One great reason not to use recursive functions for traversing trees is that you can allocate your own stack data structure rather than relying on the call stack itself. In most languages/runtimes, the call stack has a maximum depth which limits the depth of trees you can process, usually on the order of thousands of stack frames.
Managing your own stack usually produces weirder looking code (personally I find "naive" recursive approaches more readable) - but having it as a first-class language feature could solve that!
Here's the range based for loop counterexample from the blog post, as a python generator:
[1] https://en.cppreference.com/w/cpp/coroutine/generator
Candidates: Racket, Scheme, Rust.
Defining your own iterator would be enough for most cases.
But tree traversal doesn't have this universal property. There are too many methods and purposes for traversing a tree, sufficient that IMHO no single primitive embodiment could materially improve a language. Also, modern compilers efficiently break down high-level traversal code so well that expressing the idea at a high level incurs no serious penalty compared to having a primitive for that purpose, or a series of them.
[0] https://www.hillelwayne.com/post/graph-types/ and https://news.ycombinator.com/item?id=39592444
I explored this idea with gstd, a standard library for graphs inspired by the JS Array interface: https://github.com/cdrini/gstd/
https://godbolt.org/z/fnGzszf3j
Here is how I one could write in C: https://godbolt.org/z/P3ErP4T4d or possibly with the setup code moved into the helper function for convenience: https://godbolt.org/z/soz7W5z1G
OTOH I read/heard that the beauty of array languages is that they have few data types, but they can be easily worked on. So maybe the answer is not easier tree traversal primitives, but better literature/training on how to transform it in a more manageable data type.
Sometimes the answer is to adapt our vision to our world, sometimes the answer is to adapt the world to our vision.
This doesn’t require the whole tree to be in-memory. Optimal implementation would recurse via programmer-controlled stack (instead of language-controlled) to avoid nested iterators.
Here is a TypeScript version:
A similar approach should be possible in any language which supports iterators/generators.This kind of imperative iteration seems better served by the traditional visitor design pattern: more verbose (more explicit, not more complex) and more general.
As a bonus, I think this is tail-call-optimization friendly.
[1]: https://lean-lang.org/doc/reference/latest//The-Type-System/...
For exposition, lets seperate loop initialization from the rest, make `continue` mandatory to repeat the loop with another specified value (i.e. the loop implicitly ends without a continue) and lets say `descend` to recurse into the tree.
Compare with a linear loop: The `descend` keyword differs from `continue` in that its more of a continuation - control flow will continue from that point once it is done.You could make this more functional and less imperative (and I'd be surprised if such things don't exist as libraries in some functional languages). Yes ultimately we can transform this to recursive functions and compile it that way... but for imperative languages (especially those without closures) something like the above might be useful.
Arguably the above loops are _more_ imperative and less declaritive than the standard ones from the C family. You could add a functional twist by breaking with a value (which is the result of the overall `loop` expression).
1: https://github.com/evolve75/RubyTree
But I think grafting those two together is the right answer, not inventing a new loop construct.
Trees are easy to write iterators for. DAGs are a bit harder and full graphs are an advanced interview question.
Not as ergonomic as a direct tree-iterator, but I can't see of an elegant way to introduce that in an imperative language while keeping the forking/recursion aspect clear
I guess/expect that will have to expose the implementation detail of the branching factor (yes, a binary tree is a special-case of a https://en.wikipedia.org/wiki/B-tree for B = 2, but algorithms tend to be different)
On top of that, build a library of tree traversal routines (breadth first, depth first with a few variants), allowing the user to specify code that decides, on each node visited, whether to navigate further down and whether to stop the traversal.
Now, whether the language proper needs a generic way to specify tree traversals? I doubt it, and doubt even more that it is possible too that in a way so that a) most graph traversal algorithms can use the primitive and b) the result is easier to use/looks significantly nicer than just calling those routines.
Essentially you use `first` to contain a Queue, Stack, or Level for the different traversals, and define traversal or activities from there.
It's fairly ergonomic in practice, ergonomic enough for Leetcode.
Here's a BFS: https://leetcode.com/problems/course-schedule-iv/solutions/6...
[0] https://doc.rust-lang.org/std/iter/fn.successors.html
[1] https://docs.rs/itertools/latest/itertools/fn.unfold.html
Thank you... I'll see myself out... lol =3
I think it's a better idea to write a function that applies a function to the elements of a tree. Ideally you'd make a function for each traversal order. This makes it obvious what is happening.
map_tree_pre(my_tree, my_func);
map_tree_in(my_tree, my_func);
map_tree_post(my_tree, my_func);
map_tree_bfs(my_tree, my_func);
I think allowing for starting this kind of search from a given node would cover most (though not all) of what you'd want to expose the trees structure for directly.
So something similar to `break` and `continue` should be the method of branching (descending into the tree).
I think Scala might be a great language to prototype implementation of such construct.
It allows you to specify traversal in a structure-shy way. Personally, I think it goes a little too far, but it's definitely a good inspiration.
[1] https://www2.ccs.neu.edu/research/demeter/
Presentation: https://www2.ccs.neu.edu/research/demeter/talks/overview/qua...
> This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Also I’m slightly confused by this example.
}So, our “next node” operation is to concatenate to x. Won’t we either have to have a method for modifying x to go “up” a node, or we’ll have to keep a record of what x was upon entering each node? Like in this example we’ll end up with x=“aaaaaaaa” and then go up a node, over to a “b” node, and get x=“aaaaaaaab”, right?
I guess we can delete the a node’s copy of x after all of a node’s child nodes are visited, at least.
Perhaps it can be optimized to be a little better than the recursive version, depending on how much overhead your language uses for a stack frame that it won't need for this special case.
We don't have syntax and semantics for recursion schemes in any programming language - it's always deferred to library support at best. As far as I'm concerned, this is an open problem in programing language design where we finally replace the vestigial recursion hack with a proper structured programming solution.
Or for a possibly more useful comment, constant space tree traversal is genuinely quite difficult to implement. I don't know a general solution to it (other than walk from the start N times, quadratic style), would be interested to hear of one.
There's Morris tree traversal. It basically turns an ordinary binary tree into a threaded binary tree and fixes it up after it's done. But, this is mainly possible because of the space wasted by the null pointers at the leaves of the tree. You can hide a stack within the tree.
Naively I would expect something like left-hand-rule maze traversal to be constant space. The tree may need additional structures to support this sort of travel.
Other thoughts: most state machines are constant space.... there may be something there.
That constant space used to store the size could instead be a pointer to the root, but all the use cases I have for trees involve aliasing subtrees in which case there can be multiple "parents" for a given subtree and one cannot point to it.
In the spirit of completeness, my standard tree traversal recurses to a fixed depth and then falls back to the walk-repeatedly-from-the-root, which is constant space of a sort, but feels suboptimal.
It might be a competitive approach if the tree is very unbalanced or there are enough concurrent traversals, but in main experience the main reason to use parent pointers or the like are operations that are not visiting each node once starting from the root.
If this becomes a C++ feature, imagine how many data structures we would need to support?
Many other languages, specially the FP languages, allow to do that as a library. Even the languages that are only inspired by FP. Example, Ruby:
Using the Enumerable mixin includes many FP-based methods, such as map, filter and reduce by only defining each, which in this case is DFS.Then we can proceed to define a binary tree:
Iterate over all the elements: Iterate over the even elements: Stop iteration when finding a value: And so on. The same can be done in Python, Kotlin and many others.C++ already solved that problem. Iterator are designed so that an algorithm can be written once and used with multiple data structures.
I really don't see your point
EnScript had a forall(NodeClass n in tree.GetRoot(){} construct that was very easy. It was essentially a depth-first iterator.
https://cs3110.github.io/textbook/chapters/data/trees.html
Encoding the semantics of a tree traversal operator likewise is difficult in the general case. What exactly would the order be, what if I want to traverse in a non-standard ordering, what about skipping branches; all would be difficult to cleanly represent.
I have seen it done where you return actions with key ones being recurse, stop, replace, and replace & then carry out some function, but again, this is pretty simple to implement.
Const km = 1 Const velocity = 11
And do km + velocity = 12 is all kinds of silly.
Const true = False Const false = True
And do if (true) "won't happen" else if (false) "will happen" is all kinds of silly.
Edit: Assuming your example is even statically typed.
A TXR Lisp macro for-tree for traversing a tree, given a variable, an expression for the root node, and expressions for how to to left and right relative to the variable.
A node structure that for-tree knows nothing about. No iterator abstraction or API, nothing.
Example tree (a balanced binary tree whose numeric values happen to be sorted): Walk it: Code: The left-expr and right-expr must be expressions that involve the var variable; the construct evaluates these to find the left or right child. When the value is nil it means there is no child.This is almost exactly what the blog is asking for, transliterated to Lisp syntax.
Original concept in fantasy C syntax:
Lispified: As required, the construct specifies the node variable to be the loop dummy, the root expression giving its initial value and the two accessor expressions for left and right traversal.The termination test N != NULL is made implicit in the Lisp version, so early termination requires a break out of the loop. It could be arranged.
The fantasy C syntax specifies a variable name in the navigation declarator: N : { N->left, N->right }. Presumably, the variable here can be renamed to anything you want; it just arbitrarily happens to have the same name as the loop dummy.
I didn't replicate this feature exactly because it just adds verbosity for nothing. It's okay if the navigation declarator just refers to the loop's one and only dummy variable.
Anyway, programming languages should definitely not have a tree traversal primitive. Rather, languages should be Lisp.
The left, right and key fields of a node object are accessed with like-named functions. We must call tree-root on a tree object to get to the encapsulated root node:
How about a tree of integers. Let's say the root integer is 1, and to go left is to multiply by 2, and to go right is to multiply by 2 and mask in a 1. And let's say the depth caps out at the value 16: Let's see that in binary: Let's right-align digits to see different patterns in it: Nah, left-aligned wins.Both looping and tree traversal can be done with library functions.
In general, everything that can be done with library functions, should be done with library functions.
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
Eh, in Haskell you can derive many of these things automatically. And even Rust has some automatic derivation.
Bonus: Java's HotSpot magick will NOP (most?) methods of Null Objects, making this a zero cost abstraction.
I should probably write a concrete example for a blog or something.
TLDR: For every base class such as TreeNode, create a NullTreeNode that does nothing, then replace all uses of null with NullTreeNode. Voilá, no more null checks or NPEs.
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
No it does not
Web browsers have this. It’s the DOM and it scares the shit out of most JavaScript developers. The fun part is confronting people about their irrational fears watching them spin their wheels with all the imagined excuses.
I am a tremendous fan of tree systems. Tree systems should completely replace inheritance in programming to enforce the composition over inheritance principle.
Programming languages should have less primitives like this and instead we should have better foundation libraries for the languages, i.e., containing iterator/-interfaces like Rust and Python (or Smalltalk).
I will be transparent here, I do not know of a non recursive, tree traversal alogorthim that uses constant storage but doesn't modify the tree.. (Morris Traversal satisfies 2 but makes temporary changes to the tree)
Most primatives do not allocate hidden data (even C++ iterators require you to instantiate the iteration state as a variable) and operate within a known stack/execution context.
Having an actual primitive that implements arbitrary tree traversal would require a substantial complier generated, but non-transparent, framework for that execution. It would either need to be recursive, have localized dynamic data, or would require tree structures that are well defined to use something like Morris. (And Morris is natively non-concurrent even for reads)
While this means that that simple primitive would actually do a lot of potentially problematic things under the hood. How many Junior programmers would call it concurrently for reads because it's not "modifying the data". Why do you need to lock? Things like brake and continue and other control mechanisms wouldn't actually work as assumed from the flat context.... I.E what happens if somebody called a goto inside of that block? Also, in an embedded context where you may not have a dynamic allocator, how would a version written as in reiterative function with a expanding stack data structure function? From a language standard point of view, there are so many complexities to building something like this that runs in any context that the language is supported. Something like this that runs in any context that the language is supported.
I mean look at C++, they have specifically and (deliberately / arguably rightfully) failed to build copy on write strings (a simpler abstraction) because there are cases where they couldn't guarantee that they would have the expected results in all contexts.
This to me sounds more like something that should be a lambda function and a standard library, not a primitive... Languages that support dictionaries and maps as part of their standard data structures already have library support for tree traversal internally. They have solved many of the problems of making that language exposed, so that would be a much smaller ask. And programmers rightfully assume a lot more complexity within library calls then they do within primitives. Primitives are designed to be primitive. Having primitives able to create stack frames, dynamically allocate hidden memory, and/or recurse honestly worries me.
1. BFS is not supported.
2. Traversal type (inorder, preorer, postorder, or even mixed order!) is also not handled.
3. The syntax doesn't make it apparent that stack overflow can occur, e.g. by doing DFS on a linked list.
(And to be clear, none of these involve sitting down and writing some generic Tree<T> structure, they're all cases of "Well, there's some tree-like datastructure that already exists or is a natural fit for this system and I'm going to be traversing it depth-first or breadth-first to do something to the elements of it or calculate something based on what nodes I pass through on a given traversal")
- Every html document is a tree structure. And css documents have special syntax to address nodes within those trees
- If it's any good, you're routing framework probably uses some kind of tree to quickly match the request to it's handler
- The database you write to uses a tree to quickly find the location it needs to write to