Are r1 generated outputs good for using it as training data?
Is it good for training the next new model? for example any problem that r1 solves can be used for training data, and keep doing that and get a better model, use that better model again for training data, and it keeps getting better and better which would lead to self improvement? Is this right?