What are you trying to imply? That training Transformer models necessarily needs to be a continuous process? You know it’s pretty easy to stop and continue training, right?
I don’t know why people keep commenting in spaces they’ve never worked in.
No datacenter is shutting off of a leg, hall, row, or rack because “We have enough data, guys.” Maybe at your university server room where CS majors are interning.
“And you pause training while you dont.” lmao I don’t know why people keep giving advice in spaces they’ve never worked in.
What are you trying to imply? That training Transformer models necessarily needs to be a continuous process? You know it’s pretty easy to stop and continue training, right?
I don’t know why people keep commenting in spaces they’ve never worked in.
No datacenter is shutting off of a leg, hall, row, or rack because “We have enough data, guys.” Maybe at your university server room where CS majors are interning.