Chapter 348: Desperate Meta
There is an ancient Chinese proverb that goes, “Seven days in the mountains, thousands of years in the world.” Nilanjan’s current feeling is pretty much the same as this proverb.
After a simple wash-up, he was taken to META’s headquarters in New York. Zuckerberg looked at him with a pious expression: “Professor Balasubramanian, I know you have extraordinary expertise in the artificial intelligence field.”
Nilanjan thought to himself that with his surname so hard to pronounce, Zuckerberg could say it so clearly—it was really tough on him.
Immediately after, Zuckerberg’s words started to surprise him.
Extraordinary expertise in the artificial intelligence field? Nilanjan pondered this sentence, wondering if there was another trap, but then he thought that a big shot like Zuckerberg, a world-class tycoon, wouldn’t stoop to trapping him.
Moreover, as a professor in the artificial intelligence field at Stony Brook University, saying he had extraordinary expertise wasn’t entirely off the mark.
“I do indeed have my own insights into artificial intelligence.” Nilanjan said with a smile. The days of torment in prison over the past year or so were finally over. He was about to usher in a new life. His confident smile, poised sitting posture, and wise brain finally regained the upper hand in his mind.
After hearing this, Zuckerberg smiled even more happily. “As expected from Randolph’s professor. I knew you were definitely not ordinary!”
Zuckerberg had pulled Nilanjan out of prison without much effort. After all, he was a long-time staunch partner of the Donkey Party and had donated an unknown amount of money to them.
Nilanjan hadn’t really committed any crime anyway. The FBI investigated back and forth but found no connection between Nilanjan and the Apollo Moon Landing, no evidence of decisive significance.
They had kept him locked up before simply because they treated him as the scapegoat. A professor of Indian descent with no background at all seemed quite suitable to take the blame for China’s first moon landing in the 21st century.
But when Zuckerberg intervened, Nilanjan became an insignificant little character again, and he easily got him out.
Moreover, the fact that the other party had been locked up for over a year suggested he really had some skills.
“Professor Balasubramanian, what do you think of large language models?” Zuckerberg asked.
Nilanjan’s brain started racing at high speed. After all, this concerned his own safety! He had to demonstrate his value so that he could keep being bailed out outside, or even be released without charges.
He chuckled bitterly to himself: What kind of situation is this? He was clearly innocent, yet now he had to demonstrate value to be proven innocent. What kind of country is this?
“I believe this is a direction with great development potential. The paper I published at the ACL conference a few years ago, ‘DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering,’ addresses the pain points of Transformer-based QA models—slow computation and high memory due to full-layer input-wide self-attention—and proposes DeFormer, a decomposed Transformer variant.
At lower layers, DeFormer replaces full self-attention with question-wide and passage-wide self-attention, avoiding cross-computation between question and passage sequences.
This allows independent processing of input text, enabling pre-computation of passage representations and thus substantially reducing runtime computation.
The DeFormer structure is similar to Transformer and can be directly initialized with pre-trained weights, then fine-tuned on QA datasets.
Our experiments show that DeFormer versions of BERT and XLNet achieve over 4.3x speedup on QA tasks, losing only 1% accuracy via simple distillation loss.”
What Nilanjan was talking about was his paper published at the ACL conference in 2020, a classic work in the LLM optimization field at the time. The popular LLM model back then was called BERT. This paper was built directly on pre-trained Transformers. The bottleneck of LLMs, namely computational cost, was prominent in downstream tasks, and this paper provided some solutions to a certain extent.
“Including another work of mine from 2020, which actually shares similar core logic with the core of LLMs, namely multi-layer attention.”
Nilanjan was naturally no fraud. He had indeed immersed himself in the artificial intelligence field for many years, with solid achievements and several top conference papers, all related to LLMs.
That was still in 2020, when large models were still obscure, a marginalized direction in the artificial intelligence field.
Zuckerberg had spent a lot of wasted money renaming Facebook to META, misestimating the arrival time of the metaverse, but that didn’t mean he had no brain. He didn’t seek out Nilanjan just because he was Lin Ran’s professor.
The fact that Nilanjan truly had some skills was also a very important reason.
Key work in large models, including self-attention mechanisms, multi-head attention, positional encoding, etc., Nilanjan had in-depth research on all of them. After all, one of his major research directions was natural language processing.
Zuckerberg was overjoyed, feeling he had found the right person.
“Professor Balasubramanian, in training LLMs, how do you handle overfitting or underfitting issues?”
“Large-scale training, pre-training involves learning general representations on massive unlabeled data, which we can do via masked language modeling or next sentence prediction. Additionally, fine-tuning adjusts weights on specific task datasets to achieve transfer learning.
For overfitting, I believe using regularization and dropout, such as a 0.1 dropout rate in BERT variants, and applying early stopping; for underfitting, increasing model depth or data augmentation.
In previous projects, I handled training instability via gradient clipping, reducing overfitting rate from 15% to 5% on the GLUE benchmark, which helps large model training be more efficient in multi-task adaptation.” Nilanjan was fully confident.
Asking me this—isn’t it a piece of cake?
Zuckerberg then asked some questions about parameter-efficient fine-tuning, main challenges of multimodal models, causes of hallucinations and mitigation strategies, etc., and Nilanjan answered fluently.
After listening, Zuckerberg confirmed he had found the right person.
The other party had been locked up in prison for over a year and could still talk eloquently upon release, keeping up with the latest progress—he was clearly a pioneer talent in the large model field.
Moreover, he had mentored top-tier geniuses like Randolph Lin. He managed Deep Red, so under Professor Balasubramanian’s leadership, it wouldn’t be too much for our META to produce a Deep Blue, right?
Zuckerberg’s already smiling face broke into an even happier grin: “Professor Balasubramanian, welcome to join META. In the future, you will serve as META’s chief scientist, leading us forward.”
He pressed a button on the desk, and META staff came in with a contract. Zuckerberg handed it to Nilanjan: “Professor Balasubramanian, congratulations, you will become a billionaire.”
Nilanjan took it and looked—stunned. Annual salary of 100 million US dollars.
This number made him a bit hesitant to sign.
Zuckerberg could get him out, so he could certainly send him back in.
With a 100 million US dollar annual salary, if he couldn’t produce results, wouldn’t he be locked up for life?
“Boss, isn’t this number a bit too high?” Nilanjan asked cautiously.
Zuckerberg was also shocked. An Indian descendant actually proactively complaining the salary was too high? META had plenty of Indian descendant executives, and no shortage of Indian descendant scientists, all of whom only told him about their contributions, hinting for raises.
Nilanjan was the first Indian descendant he had seen who thought the salary was too high.
“No, Professor Balasubramanian, rest assured, this price is not high at all.
You’ve just come out of prison and don’t yet know what happened in the world in the recent year.
If you knew what happened, you would know this number is very reasonable.”
After Zuckerberg finished speaking, Nilanjan skimmed it again roughly, then signed his name on the contract.
Zuckerberg smiled and shook hands with him for a commemorative photo.
The next day, META officially issued an external announcement:
“The company will appoint renowned artificial intelligence expert Professor Nilanjan Balasubramanian as the company’s chief scientist, responsible for leading research on artificial intelligence large models.
This appointment comes at the moment when generative pre-trained Transformer models have burst onto the scene, sparking a global AI revolution. Meta is committed to accelerating open-source AI innovation and promoting safer, more efficient AI technology development.
Professor Nilanjan Balasubramanian is currently a faculty member in the Computer Science Department at Stony Brook University, State University of New York, with over 15 years of experience in natural language processing (NLP) and machine learning research.
His pioneering work includes developing the DeFormer framework to optimize pre-trained Transformer models for efficiency in question answering tasks, as well as exploring event representations and attention mechanisms in user personality prediction. These achievements have been published at top conferences like ACL and AAAI and widely cited.
The professor’s expertise will boost Meta’s continuous innovation on the Llama series of large models, ensuring AI technology is more inclusive and reliable in applications for social, metaverse, and global connectivity.
As chief scientist, Professor Balasubramanian will lead the Meta AI research team, focusing on key areas such as multimodal model optimization, hallucination mitigation, and sustainable computing. His joining marks a further strengthening of Meta’s strategic investment in the AI field, aiming to provide global users with smarter, safer digital experiences.
About Meta: Meta builds technologies that help people connect, find communities, and grow businesses. Through our apps and services, we are committed to making the world more connected.”
Zuckerberg then posted on Facebook: “In the current era opened by GPT models, we need top-tier talent to lead the future of open-source AI.
Professor Nilanjan’s deep academic background and practical innovation capabilities will help us build more efficient, more responsible large models, promoting harmony between humanity and technology. I can’t wait to collaborate with Professor Nilanjan Balasubramanian!”
The market paid even more attention to Nilanjan’s other identity: PhD mentor of Randolph Lin.
In November 2022, Meta announced layoffs of 11,000 employees. In mid-March 2023, Meta announced another 10,000 layoffs.
Consecutive layoffs, focusing on AI, cost reduction and efficiency improvement—META’s external messages were very clear.
That day in the US Stock Market, META’s stock price surged over 7%. For a company with 180 billion US dollars market capitalization, just the signal from recruiting Nilanjan increased their market cap by 10 billion US dollars.
Nilanjan had earned a hundred years of salary.
Only when Nilanjan returned home did he learn what had happened in the outside world during his time locked up in prison.
“What? Large models have become all the rage? Now technology giants all have to mention large models at every turn?”
“GPT is too strong. I previously thought the LLM path had great potential, and sure enough.”
“Why do I feel Deep Red is even better than GPT?”
Deep Red only allows registration with China mobile phone numbers, not open for external network registration, adopting a strategy similar to GPT, open only to specific regions.
Therefore, Deep Red cannot be used abroad, but YouTube and TikTok are full of Deep Red usage videos.
After all, plenty of foreigners in China and Chinese people studying abroad share these to the external network.
The Chinese Internet and external network do have some degree of isolation, but it’s very thin, just a thin sheet of paper.
AI enthusiasts on the external network were drooling over Deep Red—this thing is free and seems better than GPT.
GPT-4 requires payment.
Nilanjan felt immense pressure inside. GPT-4—he wasn’t even sure he could handle it, let alone Deep Red.
He didn’t dare call Lin Ran either. His face might allow it, but the experience of being taken away by the FBI made him afraid to do so.
What if they pinned another fabricated charge on him later? Wouldn’t that be the end?
Now he was a top-tier worker with a 100 million US dollar annual salary, the chosen son of Indian descendants. Even Indian newspapers called him the father of Deep Red, believing Deep Red was technology stolen by Lin Ran from Nilanjan.
Indian media was just that confident.
Some Indian newspapers even called for Nilanjan to pass the technology to Indian companies. China has Deep Red, India needs a White Elephant too!
This was due to the competition between India and China, known as the Dragon-Elephant Conflict.
Nilanjan started calling friends and summoning his Indian descendant friends to META. Only a herd of elephants can exert maximum power!
A single elephant in the jungle has no deterrence, but a herd of elephants in the jungle—even the king of beasts has to retreat.
As META’s chief scientist, besides treatment, he also had the right to recruit people.
META’s LLM team would gather the power of the elephant herd. Nilanjan thought, forget it, still need to recruit some hardworking Chinese descendants. Let me think—which Chinese descent professors am I close to? Have them recommend elite students to do the work, top-tier elites only, from Tsinghua University and Yenching University—no, SJTU works too.
After regaining his senses, Nilanjan knew that if a team was all Indian descendants, it would face a problem: everyone only thinks strategically, lacking tactical execution, which obviously wouldn’t work.
Nilanjan moved to California with his family. As for not being allowed to leave New York during bail, Zuckerberg used cash power to change it to not leaving America during bail.
As META’s most watched department with the most resources, Zuckerberg basically visited the AI laboratory every day for inspection. He discovered that Indian descendants and Chinese descendants were increasing here, while white people were decreasing.
A month later, in the entire artificial intelligence research and development office, around 200 people, only Indian descendants and Chinese descendants, no white people.
“Very good, we are on the right path.” Zuckerberg thought.
If Zuckerberg discovering the New World and finding Nilanjan as chief scientist was like high mountains and flowing water finding a soulmate, successfully finding his own Zhong Ziqi, then Baidu fell into an unprecedented trough.
The sudden emergence of Deep Red caused Wenxin Yiyan to become roadside trash before it was even born.
Wenxin Yiyan only held a press conference, just opened applications. Baidu opened internal testing slots based on applications.
As a result, comparison videos between Wenxin Yiyan and Deep Red were posted by Bilibili upmasters on Bilibili. The contrast was too stark—like a PhD student vs. an elementary student. Baidu couldn’t hold it together and fully shut down internal testing the next day.
Released? Yes, released. Can users actually use it? No, all show no substance.
Internal testing became purely internal, no testing.
Baidu faced an unprecedented crisis; even internally, they didn’t know what to do.
Because their original plan was to charge fees, like GPT. Wenxin Yiyan had to charge—50 US dollars a month. The first artificial intelligence large model, the first to charge, the first to break-even, then expand investment, improve user experience, kill off competitors, unify the artificial intelligence large model market on the Chinese Internet.
That was the dream Baidu had, forming a positive cycle, continuously snowball rolling.
But it got stuck at the first step.
Charging for this elementary student model?
Four days after Deep Red’s launch, user count broke 10 million. Under the celebratory Weibo post for Deep Red’s user count breaking 10 million, the top-liked comment was an image: left side Wenxin Yiyan, right side Deep Red, then two numbers below:
“10000000:0”
Meaning Deep Red already had 10 million users, while Wenxin Yiyan had 0. The latter had a grand press conference, while the former only had an 8-second short video.
This contrast was so obvious that the shock from the numbers themselves hammered the first nail into Baidu’s coffin in the minds of netizens.
The pressure inside Baidu was too great, explosively so.
Starting from the day after Wenxin Yiyan’s release, the entire Baidu building was in a low-pressure state. No one dared speak loudly, afraid of angering the big boss and getting laid off directly.
In the Baidu building’s meeting room, all executives gathered, and besides them, Baidu’s internal artificial intelligence field experts were all there too.
“Boss, Tencent isn’t attacking Wenxin Yiyan—this is slapping our face! This is blatant provocative behavior!” the secretary said. “I believe someone must take responsibility this time. Clearly, someone misjudged the technology, underestimated the situation, and rushed to seize the market for credit before Wenxin Yiyan was mature.”
An executive in charge of administrative work pointed the spear straight at CTO Wang Haifeng.
Wang Haifeng sincerely apologized: “I did misjudge the situation. I didn’t expect Tencent’s progress to be this fast—catching up or even surpassing OpenAI in two months.”
Before Wang Haifeng finished, Robin interrupted: “Hey, Haifeng, alright. I think no one in this world could have predicted this beforehand.
This is just like Professor Lin Ran’s moon landing over a year ago—they accomplished the impossible miracle. I won’t blame subordinates for this; it’s a case of defeat not due to fault.
With Robin saying this, everyone knew this meeting wasn’t about finding problems or internal struggle—more bluntly, not about finding someone to take the blame, but about hoping to solve the problem.
“Haifeng, how long until we catch up?” Robin asked.
Wang Haifeng said honestly: “I estimate by July we can reach 80% of GPT-3’s performance, by December match GPT-3’s performance, and by next July reach Deep Red’s performance.”
After a pause, Wang Haifeng said: “The worst thing about Deep Red is that it doesn’t open-source, so we can’t reference it.”
Not plagiarism, referencing!
“GPT doesn’t open-source, starting from GPT-3. GPT-1 and GPT-2 were open-source.
Open-source means we at least know how it’s done, what technologies it uses. Even if we don’t know specific technical details or engineering implementations, we can grind it out over time.
But Deep Red is different—it’s a complete black box. We don’t even know if it’s on the same technical path as GPT.
Why Deep Red performs so outstandingly in Chinese contexts, with exemplary handling of Chinese text—we don’t know if it’s existing technology or Professor Lin Ran’s original creation.
I believe we must do one thing now.”
“What?” Robin asked.
“Poach people. We must poach from Deep Red to figure out exactly how they did it. The more intelligence, the faster we catch up. Even knowing just a direction is better than blindly groping now without direction.
Moreover, Robin, we are a giant company. A giant company’s biggest advantage is resources and cash.
We need to figure out how Deep Red is done. We need to poach from both OpenAI and Deep Red simultaneously, then have two internal teams advancing separately along GPT and Deep Red technical paths.
Involving resource scheduling, prioritizing based on performance.”
This is standard operation for giant companies, an open scheme.
For small company products, I copy directly—anyway, you can’t resist. For some core engineering problems I can’t solve short-term, and can’t delay market seizure, I poach your people.
“I know this is definitely a way, but the problem is it’s too fast—Deep Red’s progress is too fast.” Robin frowned, his handsome gentlemanly temperament gone. “Unlike past competitions, where user growth was linear, slowing after a certain point.
As latecomers, with their growth slowing and ours skyrocketing, we could catch up.
But large model growth speed isn’t logarithmic or linear—it’s even exponential.
Even if we poach successfully and get core engineers, fastest we’d launch something like Deep Red in half a year?”
Wang Haifeng chuckled bitterly inside: Half a year? Half a year unless we poach Professor Lin Ran, only then possible.